Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Engine] Frequent "TaskGroupLocation: TaskGroupLocation xxx already exists" error during task execution, causing task failure. #8566

Open
2 of 3 tasks
kwonder0926 opened this issue Jan 21, 2025 · 2 comments
Labels

Comments

@kwonder0926
Copy link

kwonder0926 commented Jan 21, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

This issue occurs on both versions 2.3.8 and 2.3.9.
By observing the logs, I found that when I execute a task, my task starts executing normally for a short period, and then it suddenly disconnects, and I don't know the reason for this, leading to a timeout exception. The worker nodes then reinitialize the cluster, and the master node resubmits the task to the worker nodes. At this point, an exception is thrown, indicating that the task already exists, resulting in task failure.

I will attach the key parts of my logs below for further investigation.

SeaTunnel Version

2.3.8、2.3.9

SeaTunnel Config

env {
  # You can set SeaTunnel environment configuration here
  parallelism = 2
  job.mode = "STREAMING"
  checkpoint.interval = 2000
}

source {
  MongoDB-CDC {
    hosts = "10.xxx.xxx.xx:27017,10.xxx.xxx.xx:27017,10.xxx.xxx.xxx:27017"
    database = ["letterMsg"]
    collection = ["letterMsg.2024-05-send_detail","letterMsg.2024-04-send_detail"]
    username = ******
    password = "*************"
    connection.options = "replicaSet=rep1&connectTimeoutMS=300000" 
    tables_configs = [
    {
      schema {
      table = "letterMsg.2024-05-send_detail"
      fields {
	_id = STRING,
        order_id = STRING,
        phone_num = String,
        message_context = STRING,
        channel_no = STRING,
        send_status = STRING,
        send_time = TIMESTAMP,
        back_status = STRING,
        create_time = TIMESTAMP,
        urgent_lv = STRING,
        sign = STRING,
        sign_id = INT,
        query_report_flag = INT,
        isp_id = INT,
        user_id = BIGINT,
        enterprise_id = BIGINT,
        mongo_submit_id = STRING,
        link_id = STRING,
        batch_id = STRING,
        push_submit_time = TIMESTAMP,
        back_time = TIMESTAMP,
        number_id = BIGINT,
        receive_report_time = TIMESTAMP,
        report_msg_id = STRING,
        push_report_time = STRING,
        seq_id = STRING,
        seq_order_id = STRING
            }
	  }
	},
      {
       schema {
       table = "letterMsg.2024-04-send_detail"
       fields {
        _id = STRING,
        order_id = STRING,
        phone_num = String,
        message_context = STRING,
        channel_no = STRING,
        send_status = STRING,
        send_time = TIMESTAMP,
        back_status = STRING,
        create_time = TIMESTAMP,
        urgent_lv = STRING,
        sign = STRING,
        sign_id = INT,
        query_report_flag = INT,
        isp_id = INT,
        user_id = BIGINT,
        enterprise_id = BIGINT,
        mongo_submit_id = STRING,
        link_id = STRING,
        batch_id = STRING, 
        push_submit_time = TIMESTAMP,
        back_time = TIMESTAMP,
        number_id = BIGINT, 
        receive_report_time = TIMESTAMP,
        report_msg_id = STRING,
        push_report_time = STRING,
        seq_id = STRING,
        seq_order_id = STRING
	}
      }
    }
   ]
 }
  # please go to https://seatunnel.apache.org/docs/connector-v2/source
}

sink {
  Clickhouse {
    host = "10.xxx.xxx.xxx:8123"
    database = "xlsms_test"
    table = "sms_send_detail"
    username = "******"
    password = "*********"
    primary_key = "_id"
    support_upsert = true
    allow_experimental_lightweight_delete = true
  }
}

Running Command

bin/seatunnel.sh --config ./config/streaming-mongocdc-clickhouse.conf -n syncSendDetailJob

Error Exception

Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.common.exception.TaskGroupDeployException: java.lang.RuntimeException: TaskGroupLocation: TaskGroupLocation{jobId=933914021243912194, pipelineId=1, taskGroupId=1} already exists

Zeta or Flink or Spark Version

use SeaTunnel Engine In Separated Cluster Mode

Java or Scala Version

java 1.8.0_171

Screenshots

Image

Image

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kwonder0926
Copy link
Author

I saw a statement online that if a table to be transferred is too large, the checkpoint timeout needs to be set longer. But what value should I set it to? I would like to ask if there is any reference for this value?

@lphbdfj
Copy link

lphbdfj commented Jan 24, 2025

I have also encountered this issue and have not yet found a solution. I hope the official team can provide a resolution as soon as possible. -> [Bug] [Engine] Frequent "TaskGroupLocation: TaskGroupLocation xxx already exists" error during task execution, causing task failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants