Skip to content

[Bug] [Mysql] Job not get more data after restart when running snapshot #616

@xxntti3n

Description

@xxntti3n

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Flink CDC Snapshot Processing & Restart Issue

Scenario

  • Flink CDC job performs a snapshot phase to incrementally read chunks of a database table.
  • Doris backend is available and receiving data normally.
  • After a job restart, only 2 to 3 snapshot splits are processed.
  • Then the job stops receiving further snapshot splits and does not resume processing the latest incremental chunks.

Job Log

2025-09-23 08:17:55,482 INFO org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] - Received resource requirements from job 2e93b210fd6dd1dbdb4ce167609dbe16: [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, numberOfRequiredSlots=1}]
2025-09-23 08:17:55,483 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: MySQL Source -> Sink: Writer -> Sink: Committer (1/1) (99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) switched from SCHEDULED to DEPLOYING.
2025-09-23 08:17:55,483 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: MySQL Source -> Sink: Writer -> Sink: Committer (1/1) (attempt #18) with attempt id 99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18 and vertex id cbc357ccb763df2852fee8c4fc7d55f2_0 to database-name-7-taskmanager-1-1 @ 10.81.16.5 (dataPort=34259) with allocation id 39dedae0c787f2ab59a05ba00b92ca9c
2025-09-23 08:17:55,497 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: MySQL Source -> Sink: Writer -> Sink: Committer (1/1) (99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) switched from DEPLOYING to INITIALIZING.
2025-09-23 08:17:55,563 INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source Source: MySQL Source registering reader for parallel task 0
2025-09-23 08:17:55,583 INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source Source: MySQL Source received split request from parallel task 0
2025-09-23 08:17:55,584 INFO org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator [] - The enumerator starts to assign split to subtask 0
2025-09-23 08:17:55,585 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: MySQL Source -> Sink: Writer -> Sink: Committer (1/1) (99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) switched from INITIALIZING to RUNNING.
2025-09-23 08:17:55,585 INFO org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator [] - The enumerator assigns split MySqlSnapshotSplit{tableId=database_name.table_name, splitId='database_name.table_name:199', splitKeyType=[id BIGINT NOT NULL], splitStart=[1941445], splitEnd=[1951201], highWatermark=null} to subtask 0
2025-09-23 08:18:17,794 INFO org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator [] - The enumerator under INITIAL_ASSIGNING receives finished split offsets FinishedSnapshotSplitsReportEvent{finishedOffsets={database_name.table_name:165={ts_sec=0, file=mysql-bin.008601, pos=10726803, kind=SPECIFIC, gtids=08360076-238b-11ed-a6d6-42010a6aa024:1-185507642,
7c68f178-0aea-11f0-9529-42010a6aa0d4:1-197266132,
ba3cd672-fc20-11eb-af4e-42010a6aa002:1-122293766, row=0, event=0}}} from subtask 0.
2025-09-23 08:18:17,794 INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source Source: MySQL Source received split request from parallel task 0 (#18)
2025-09-23 08:18:17,794 INFO org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator [] - The enumerator starts to assign split to subtask 0
2025-09-23 08:18:17,795 INFO org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator [] - The enumerator assigns split MySqlSnapshotSplit{tableId=database_name.table_name, splitId='database_name.table_name:200', splitKeyType=[id BIGINT NOT NULL], splitStart=[1951201], splitEnd=[1960957], highWatermark=null} to subtask 0
2025-09-23 08:22:27,845 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 4 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1758615747799 for job 2e93b210fd6dd1dbdb4ce167609dbe16.
2025-09-23 08:22:28,735 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 4 for job 2e93b210fd6dd1dbdb4ce167609dbe16 (569104 bytes, checkpointDuration=649 ms, finalizationTime=287 ms).
2025-09-23 08:22:28,735 INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking checkpoint 4 as completed for source Source: MySQL Source.`

What's Wrong?

doris-flink-connector 25.1.0

What You Expected?

How to Reproduce?

How to Reproduce the Bug

  1. Start a Flink CDC job that is currently in the snapshot phase, connected to a Doris backend which is up and running.
  2. Terminate the Doris backend during the snapshot processing.
  3. Restart the Doris backend to make it available again.
  4. Observe that the Flink CDC job processes 2 to 3 snapshot splits successfully after Doris restarts.
  5. Repeat steps 2 and 3 (terminate and restart Doris) about 4 to 5 times.
  6. After these repeated Doris restarts, the Flink CDC job will stop fetching additional snapshot splits and will not resume processing the latest incremental chunks.

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions