Skip to content

ReceiverDisconnectedException even if using different consumer groups #680

@HaowenZhangBD

Description

@HaowenZhangBD

Hi team, we have seen the ReceiverDisconnectedException in our databricks env and done some research.
Found other people have similar problem and solved in these 2 docs

https://github.com/Azure/azure-event-hubs-spark/blob/master/FAQ.md
https://github.com/Azure/azure-event-hubs-spark/blob/master/examples/multiple-readers-example.md

We have read through them and follow the suggestions of using different consumer groups for different stream.
But we still get ReceiverDisconnectedException on both of the stream in the Similar timestamp
image

Bug Report:

  • Actual behavior

stream 1 using PATH: publisher-events-eh/ConsumerGroups/job1/Partitions/0

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5065.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5065.0 (TID 91438) (10.139.64.4 executor driver): java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'spark-driver-87' with higher epoch of '0' is created hence current receiver 'spark-driver-87' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:581a6d040004c849000eef7c64ddd416_G27_B39, SystemTracker:OUR EVENTHUB:publisher-events-eh~1023|job1, Timestamp:2023-08-17T08:02:35, errorContext[NS: OUR EVENTHUB, PATH: publisher-events-eh/ConsumerGroups/job1/Partitions/0, REFERENCE_ID: LN_a37906_1692259345344_1af_G27, PREFETCH_COUNT: 500, LINK_CREDIT: 1000, PREFETCH_Q_LEN: 0]

stream 2 using PATH: publisher-events-eh/ConsumerGroups/machine2/Partitions/0

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5069.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5069.0 (TID 91503) (10.139.64.4 executor driver): java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'spark-driver-315' with higher epoch of '0' is created hence current receiver 'spark-driver-315' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:581a6d040006c849000eef5c64ddd416_G2_B39, SystemTracker:OUR EVENTHUB:publisher-events-eh~1023|machine2, Timestamp:2023-08-17T08:02:35, errorContext[NS: OUR EVENTHUB, PATH: publisher-events-eh/ConsumerGroups/machine2/Partitions/0, REFERENCE_ID: LN_190e6e_1692259345190_e97a_G2, PREFETCH_COUNT: 500, LINK_CREDIT: 1000, PREFETCH_Q_LEN: 0]

  • Expected behavior : no ReceiverDisconnectedException
  • spark-eventhubs artifactId and version : com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22
  • Spark version
    image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions