You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A typical use case in Kafka to achieve Exactly-Once semantics when consuming messages is to store offset external to kafka atomically with appropriate state.. for that the Alpakka library provides the committablePartitionedManualOffsetSource source where offsets to start consuming from are provided through onAssign function which result is a Future with all TopicPartition->Offset(Long) assigned to this consumer.
This results on a failure (on the internal KafkaConsumerActor) that is sent back to the SubSourceLogic actor.. However, the SubSourceLogic is only expecting a AskTimeoutException and not a Failure(Throwable) message..
The end result is the Source does not resume the sources, but also doesn't fail the consumer/source. It is a silent failure and there is no indication that we will not be able to recover from this.
I can suggest two approaches for this:
recover by failing the Source... However, this is will cause more rebalances and potentially again the same race condition and so on.. this will go on until no rebalances happen in-between the two operations (assign and Future completion).. also this may cause unnecessary thrashing/load on the external offset store where the consumers are retrieving the offsets from
.recover {
case_: Exception=>
stageFailCB.invoke(
newConsumerFailed(
s"$idLogPrefix Consumer failed during seek for partitions: ${offsets.keys.mkString(", ")}."
)
)
}
Ignore the seek() failures on the KafkaConsumerActor and just send back Done when completed.. I believe this would also be a suitable approach and doesn't cause more rebalances/changes for the race condition to keep occurring.
I am running on 2.11 and 2.0.7 version.. It would be great if we could have other release for scala 2.11.
This is a critical issue that has huge impacts on volatile environments (running consumers on AWS spot instances) where consumers might come and go at will..
The text was updated successfully, but these errors were encountered:
Thanks for raising the issue. What exception is raised when the seek fails? I think it would be acceptable to log a warning for such failures and carry on. Do you have some time to put up a PR?
A typical use case in Kafka to achieve Exactly-Once semantics when consuming messages is to store offset external to kafka atomically with appropriate state.. for that the Alpakka library provides the
committablePartitionedManualOffsetSource
source where offsets to start consuming from are provided through onAssign function which result is a Future with all TopicPartition->Offset(Long) assigned to this consumer.However, due to the async nature of this operation, a rebalance can occur in between the previous assignment and the Future with offsets being completed.
The source will attempt to do a seek to all these TopicPartition(s), but some of them will no longer be assigned to this kafka consumer.
https://github.com/akka/alpakka-kafka/blob/master/core/src/main/scala/akka/kafka/internal/SubSourceLogic.scala#L154
This results on a failure (on the internal KafkaConsumerActor) that is sent back to the SubSourceLogic actor.. However, the SubSourceLogic is only expecting a AskTimeoutException and not a Failure(Throwable) message..
https://github.com/akka/alpakka-kafka/blob/master/core/src/main/scala/akka/kafka/internal/SubSourceLogic.scala#L157
https://github.com/akka/alpakka-kafka/blob/master/core/src/main/scala/akka/kafka/internal/KafkaConsumerActor.scala#L287
The end result is the Source does not resume the sources, but also doesn't fail the consumer/source. It is a silent failure and there is no indication that we will not be able to recover from this.
I can suggest two approaches for this:
I am running on 2.11 and 2.0.7 version.. It would be great if we could have other release for scala 2.11.
This is a critical issue that has huge impacts on volatile environments (running consumers on AWS spot instances) where consumers might come and go at will..
The text was updated successfully, but these errors were encountered: