Releases: confluentinc/librdkafka
v2.6.0
librdkafka v2.6.0 is a feature release:
- KIP-460 Admin Leader Election RPC (#4845)
- [KIP-714] Complete consumer metrics support (#4808).
- [KIP-714] Produce latency average and maximum metrics support for parity with Java client (#4847).
- [KIP-848] ListConsumerGroups Admin API now has an optional filter to return only groups
of given types. - Added Transactional id resource type for ACL operations (@JohnPreston, #4856).
- Fix for permanent fetch errors when using a newer Fetch RPC version with an older
inter broker protocol (#4806).
Fixes
Consumer fixes
- Issues: #4806
Fix for permanent fetch errors when brokers support a Fetch RPC version greater than 12
but cluster is configured to use an inter broker protocol that is less than 2.8.
In this case returned topic ids are zero valued and Fetch has to fall back
to version 12, using topic names.
Happening since v2.5.0 (#4806)
Checksums
Release asset checksums:
- v2.6.0.zip SHA256
e9eb7faedb24da3a19d5f056e08630fc2dae112d958f9b714ec6e35cd87c032e
- v2.6.0.tar.gz SHA256
abe0212ecd3e7ed3c4818a4f2baf7bf916e845e902bb15ae48834ca2d36ac745
v2.5.3
librdkafka v2.5.3 is a maintenance release.
- Fix an assert being triggered during push telemetry call when no metrics matched on the client side. (#4826)
Fixes
Telemetry fixes
- Issue: #4833
Fix a regression introduced with KIP-714 support in which an assert is triggered during PushTelemetry call. This happens when no metric is matched on the client side among those requested by broker subscription.
Happening since 2.5.0 (#4826).
Checksums
Release asset checksums:
- v2.5.3.zip SHA256
5b058006fcd403bc23fc1fcc14fe985641203f342c5715794af51023bcd047f9
- v2.5.3.tar.gz SHA256
eaa1213fdddf9c43e28834d9a832d9dd732377d35121e42f875966305f52b8ff
Note: there were no v2.5.1 and v2.5.2 librdkafka releases
v2.5.0
Warning
This version has introduced a regression in which an assert is triggered during PushTelemetry call. This happens when no metric is matched on the client side among those requested by broker subscription.
You won't face any problem if:
- Broker doesn't support KIP-714.
- KIP-714 feature is disabled on the broker side.
- KIP-714 feature is disabled on the client side. This is enabled by default. Set configuration
enable.metrics.push
tofalse
. - If KIP-714 is enabled on the broker side and there is no subscription configured there.
- If KIP-714 is enabled on the broker side with subscriptions that match the KIP-714 metrics defined on the client.
Having said this, we strongly recommend using v2.5.3
and above to not face this regression at all.
librdkafka v2.5.0 is a feature release.
- KIP-951
Leader discovery optimisations for the client (#4756, #4767). - Fix segfault when using long client id because of erased segment when using flexver. (#4689)
- Fix for an idempotent producer error, with a message batch not reconstructed
identically when retried (#4750) - Removed support for CentOS 6 and CentOS 7 (#4775).
- KIP-714 Client
metrics and observability (#4721).
Upgrade considerations
- CentOS 6 and CentOS 7 support was removed as they reached EOL
and security patches aren't publicly available anymore.
ABI compatibility from CentOS 8 on is maintained through pypa/manylinux,
AlmaLinux based.
See also Confluent supported OSs page (#4775).
Enhancements
- Update bundled lz4 (used when
./configure --disable-lz4-ext
) to
v1.9.4, which contains
bugfixes and performance improvements (#4726). - KIP-951
With this KIP leader updates are received through Produce and Fetch responses
in case of errors corresponding to leader changes and a partition migration
happens before refreshing the metadata cache (#4756, #4767).
Fixes
General fixes
- Issues: confluentinc/confluent-kafka-dotnet#2084
Fix segfault when a segment is erased and more data is written to the buffer.
Happens since 1.x when a portion of the buffer (segment) is erased for flexver or compression.
More likely to happen since 2.1.0, because of the upgrades to flexver, with certain string sizes like a long client id (#4689).
Idempotent producer fixes
- Issues: #4736
Fix for an idempotent producer error, with a message batch not reconstructed
identically when retried. Caused the error message "Local: Inconsistent state: Unable to reconstruct MessageSet".
Happening on large batches. Solved by using the same backoff baseline for all messages
in the batch.
Happens since 2.2.0 (#4750).
Checksums
Release asset checksums:
- v2.5.0.zip SHA256
644c1b7425e2241ee056cf8a469c84d69c7f6a88559491c0813a6cdeb5563206
- v2.5.0.tar.gz SHA256
3dc62de731fd516dfb1032861d9a580d4d0b5b0856beb0f185d06df8e6c26259
v2.4.0
librdkafka v2.4.0 is a feature release:
- KIP-848: The Next Generation of the Consumer Rebalance Protocol.
Early Access: This should be used only for evaluation and must not be used in production. Features and contract of this KIP might change in future (#4610). - KIP-467: Augment ProduceResponse error messaging for specific culprit records (#4583).
- KIP-516
Continue partial implementation by adding a metadata cache by topic id
and updating the topic id corresponding to the partition name (#4676) - Upgrade OpenSSL to v3.0.12 (while building from source) with various security fixes,
check the release notes. - Integration tests can be started in KRaft mode and run against any
GitHub Kafka branch other than the released versions. - Fix pipeline inclusion of static binaries (#4666)
- Fix to main loop timeout calculation leading to a tight loop for a
max period of 1 ms (#4671). - Fixed a bug causing duplicate message consumption from a stale
fetch start offset in some particular cases (#4636) - Fix to metadata cache expiration on full metadata refresh (#4677).
- Fix for a wrong error returned on full metadata refresh before joining
a consumer group (#4678). - Fix to metadata refresh interruption (#4679).
- Fix for an undesired partition migration with stale leader epoch (#4680).
- Fix hang in cooperative consumer mode if an assignment is processed
while closing the consumer (#4528).
Upgrade considerations
- With KIP 467,
INVALID_MSG
(Java: CorruptRecordExpection) will
be retried automatically.INVALID_RECORD
(Java: InvalidRecordException) instead
is not retriable and will be set only to the records that caused the
error. Rest of records in the batch will fail with the new error code
_INVALID_DIFFERENT_RECORD
(Java: KafkaException) and can be retried manually,
depending on the application logic (#4583).
Early Access
KIP-848: The Next Generation of the Consumer Rebalance Protocol
-
With this new protocol the role of the Group Leader (a member) is removed and
the assignment is calculated by the Group Coordinator (a broker) and sent
to each member through heartbeats.The feature is still not production-ready.
It's possible to try it in a non-production enviroment.A guide is available
with considerations and steps to follow to test it (#4610).
Fixes
General fixes
- Issues: confluentinc/confluent-kafka-go#981.
In librdkafka release pipeline a static build containing libsasl2
could be chosen instead of the alternative one without it.
That caused the libsasl2 dependency to be required in confluent-kafka-go
v2.1.0-linux-musl-arm64 and v2.3.0-linux-musl-arm64.
Solved by correctly excluding the binary configured with that library,
when targeting a static build.
Happening since v2.0.2, with specified platforms,
when using static binaries (#4666). - Issues: #4684.
When the main thread loop was awakened less than 1 ms
before the expiration of a timeout, it was serving with a zero timeout,
leading to increased CPU usage until the timeout was reached.
Happening since 1.x. - Issues: #4685.
Metadata cache was cleared on full metadata refresh, leading to unnecessary
refreshes and occasionalUNKNOWN_TOPIC_OR_PART
errors. Solved by updating
cache for existing or hinted entries instead of clearing them.
Happening since 2.1.0 (#4677). - Issues: #4589.
A metadata call before member joins consumer group,
could lead to anUNKNOWN_TOPIC_OR_PART
error. Solved by updating
the consumer group following a metadata refresh only in safe states.
Happening since 2.1.0 (#4678). - Issues: #4577.
Metadata refreshes without partition leader change could lead to a loop of
metadata calls at fixed intervals. Solved by stopping metadata refresh when
all existing metadata is non-stale. Happening since 2.3.0 (#4679). - Issues: #4687.
A partition migration could happen, using stale metadata, when the partition
was undergoing a validation and being retried because of an error.
Solved by doing a partition migration only with a non-stale leader epoch.
Happening since 2.1.0 (#4680).
Consumer fixes
- Issues: #4686.
In case of subscription change with a consumer using the cooperative assignor
it could resume fetching from a previous position.
That could also happen if resuming a partition that wasn't paused.
Fixed by ensuring that a resume operation is completely a no-op when
the partition isn't paused.
Happening since 1.x (#4636). - Issues: #4527.
While using the cooperative assignor, given an assignment is received while closing the consumer
it's possible that it gets stuck in stateWAIT_ASSIGN_CALL
, while the method is converted to
a full unassign. Solved by changing state fromWAIT_ASSIGN_CALL
toWAIT_UNASSIGN_CALL
while doing this conversion.
Happening since 1.x (#4528).
Checksums
Release asset checksums:
- v2.4.0.zip SHA256
24b30d394fc6ce5535eaa3c356ed9ed9ae4a6c9b4fc9159c322a776786d5dd15
- v2.4.0.tar.gz SHA256
d645e47d961db47f1ead29652606a502bdd2a880c85c1e060e94eea040f1a19a
v2.3.0
librdkafka v2.3.0 is a feature release:
- KIP-516
Partial support of topic identifiers. Topic identifiers in metadata response
available through the newrd_kafka_DescribeTopics
function (#4300, #4451). - KIP-117 Add support for AdminAPI
DescribeCluster()
andDescribeTopics()
(#4240, @jainruchir). - KIP-430:
Return authorized operations in Describe Responses.
(#4240, @jainruchir). - KIP-580: Added Exponential Backoff mechanism for
retriable requests withretry.backoff.ms
as minimum backoff andretry.backoff.max.ms
as the
maximum backoff, with 20% jitter (#4422). - KIP-396: completed the implementation with
the addition of ListOffsets (#4225). - Fixed ListConsumerGroupOffsets not fetching offsets for all the topics in a group with Apache Kafka version below 2.4.0.
- Add missing destroy that leads to leaking partition structure memory when there
are partition leader changes and a stale leader epoch is received (#4429). - Fix a segmentation fault when closing a consumer using the
cooperative-sticky assignor before the first assignment (#4381). - Fix for insufficient buffer allocation when allocating rack information (@wolfchimneyrock, #4449).
- Fix for infinite loop of OffsetForLeaderEpoch requests on quick leader changes. (#4433).
- Fix to add leader epoch to control messages, to make sure they're stored
for committing even without a subsequent fetch message (#4434). - Fix for stored offsets not being committed if they lacked the leader epoch (#4442).
- Upgrade OpenSSL to v3.0.11 (while building from source) with various security fixes,
check the release notes
(#4454, started by @migarc1). - Fix to ensure permanent errors during offset validation continue being retried and
don't cause an offset reset (#4447). - Fix to ensure max.poll.interval.ms is reset when rd_kafka_poll is called with
consume_cb (#4431). - Fix for idempotent producer fatal errors, triggered after a possibly persisted message state (#4438).
- Fix
rd_kafka_query_watermark_offsets
continuing beyond timeout expiry (#4460). - Fix
rd_kafka_query_watermark_offsets
not refreshing the partition leader
after a leader change and subsequentNOT_LEADER_OR_FOLLOWER
error (#4225).
Upgrade considerations
-
retry.backoff.ms
:
If it is set greater thanretry.backoff.max.ms
which has the default value of 1000 ms then it is assumes the value ofretry.backoff.max.ms
.
To change this behaviour make sure thatretry.backoff.ms
is always less thanretry.backoff.max.ms
.
If equal then the backoff will be linear instead of exponential. -
topic.metadata.refresh.fast.interval.ms
:
If it is set greater thanretry.backoff.max.ms
which has the default value of 1000 ms then it is assumes the value ofretry.backoff.max.ms
.
To change this behaviour make sure thattopic.metadata.refresh.fast.interval.ms
is always less thanretry.backoff.max.ms
.
If equal then the backoff will be linear instead of exponential.
Fixes
General fixes
- An assertion failed with insufficient buffer size when allocating
rack information on 32bit architectures.
Solved by aligning all allocations to the maximum allowed word size (#4449). - The timeout for
rd_kafka_query_watermark_offsets
was not enforced after
making the necessary ListOffsets requests, and thus, it never timed out in
case of broker/network issues. Fixed by setting an absolute timeout (#4460).
Idempotent producer fixes
- After a possibly persisted error, such as a disconnection or a timeout, next expected sequence
used to increase, leading to a fatal error if the message wasn't persisted and
the second one in queue failed with anOUT_OF_ORDER_SEQUENCE_NUMBER
.
The error could contain the message "sequence desynchronization" with
just one possibly persisted error or "rewound sequence number" in case of
multiple errored messages.
Solved by treating the possible persisted message as not persisted,
and expecting aDUPLICATE_SEQUENCE_NUMBER
error in case it was or
NO_ERROR
in case it wasn't, in both cases the message will be considered
delivered (#4438).
Consumer fixes
- Stored offsets were excluded from the commit if the leader epoch was
less than committed epoch, as it's possible if leader epoch is the default -1.
This didn't happen in Python, Go and .NET bindings when stored position was
taken from the message.
Solved by checking only that the stored offset is greater
than committed one, if either stored or committed leader epoch is -1 (#4442). - If an OffsetForLeaderEpoch request was being retried, and the leader changed
while the retry was in-flight, an infinite loop of requests was triggered,
because we weren't updating the leader epoch correctly.
Fixed by updating the leader epoch before sending the request (#4433). - During offset validation a permanent error like host resolution failure
would cause an offset reset.
This isn't what's expected or what the Java implementation does.
Solved by retrying even in case of permanent errors (#4447). - If using
rd_kafka_poll_set_consumer
, along with a consume callback, and then
callingrd_kafka_poll
to service the callbacks, would not reset
max.poll.interval.ms.
This was because we were only checkingrk_rep
for
consumer messages, while the method to service the queue internally also
services the queue forwarded to fromrk_rep
, which isrkcg_q
.
Solved by moving themax.poll.interval.ms
check intord_kafka_q_serve
(#4431). - After a leader change a
rd_kafka_query_watermark_offsets
call would continue
trying to call ListOffsets on the old leader, if the topic wasn't included in
the subscription set, so it started querying the new leader only after
topic.metadata.refresh.interval.ms
(#4225).
Checksums
Release asset checksums:
- v2.3.0.zip SHA256
15e77455811b3e5d869d6f97ce765b634c7583da188792e2930a2098728e932b
- v2.3.0.tar.gz SHA256
2d49c35c77eeb3d42fa61c43757fcbb6a206daa560247154e60642bcdcc14d12
v2.2.0
librdkafka v2.2.0 is a feature release:
- Fix a segmentation fault when subscribing to non-existent topics and
using the consume batch functions (#4273). - Store offset commit metadata in
rd_kafka_offsets_store
(@mathispesch, #4084). - Fix a bug that happens when skipping tags, causing buffer underflow in
MetadataResponse (#4278). - Fix a bug where topic leader is not refreshed in the same metadata call even if the leader is
present. - KIP-881:
Add support for rack-aware partition assignment for consumers
(#4184, #4291, #4252). - Fix several bugs with sticky assignor in case of partition ownership
changing between members of the consumer group (#4252). - KIP-368:
Allow SASL Connections to Periodically Re-Authenticate
(#4301, started by @vctoriawu). - Avoid treating an OpenSSL error as a permanent error and treat unclean SSL
closes as normal ones (#4294). - Added
fetch.queue.backoff.ms
to the consumer to control how long
the consumer backs off next fetch attempt. (@bitemyapp, @edenhill, #2879) - KIP-235:
Add DNS alias support for secured connection (#4292). - KIP-339:
IncrementalAlterConfigs API (started by @PrasanthV454, #4110). - KIP-554: Add Broker-side SCRAM Config API (#4241).
Enhancements
- Added
fetch.queue.backoff.ms
to the consumer to control how long
the consumer backs off next fetch attempt. When the pre-fetch queue
has exceeded its queuing thresholds:queued.min.messages
and
queued.max.messages.kbytes
it backs off for 1 seconds.
If those parameters have to be set too high to hold 1 s of data,
this new parameter allows to back off the fetch earlier, reducing memory
requirements.
Fixes
General fixes
- Fix a bug that happens when skipping tags, causing buffer underflow in
MetadataResponse. This is triggered since RPC version 9 (v2.1.0),
when using Confluent Platform, only when racks are set,
observers are activated and there is more than one partition.
Fixed by skipping the correct amount of bytes when tags are received. - Avoid treating an OpenSSL error as a permanent error and treat unclean SSL
closes as normal ones. When SSL connections are closed withoutclose_notify
,
in OpenSSL 3.x a new type of error is set and it was interpreted as permanent
in librdkafka. It can cause a different issue depending on the RPC.
If received when waiting for OffsetForLeaderEpoch response, it triggers
an offset reset following the configured policy.
Solved by treating SSL errors as transport errors and
by setting an OpenSSL flag that allows to treat unclean SSL closes as normal
ones. These types of errors can happen it the other side doesn't supportclose_notify
or if there's a TCP connection reset.
Consumer fixes
- In case of multiple owners of a partition with different generations, the
sticky assignor would pick the earliest (lowest generation) member as the
current owner, which would lead to stickiness violations. Fixed by
choosing the latest (highest generation) member. - In case where the same partition is owned by two members with the same
generation, it indicates an issue. The sticky assignor had some code to
handle this, but it was non-functional, and did not have parity with the
Java assignor. Fixed by invalidating any such partition from the current
assignment completely.
Checksums
Release asset checksums:
- v2.2.0.zip SHA256
e9a99476dd326089ce986afd3a5b069ef8b93dbb845bc5157b3d94894de53567
- v2.2.0.tar.gz SHA256
af9a820cbecbc64115629471df7c7cecd40403b6c34bfdbb9223152677a47226
v2.1.1
librdkafka v2.1.1 is a maintenance release:
- Avoid duplicate messages when a fetch response is received
in the middle of an offset validation request (#4261). - Fix segmentation fault when subscribing to a non-existent topic and
callingrd_kafka_message_leader_epoch()
on the polledrkmessage
(#4245). - Fix a segmentation fault when fetching from follower and the partition lease
expires while waiting for the result of a list offsets operation (#4254). - Fix documentation for the admin request timeout, incorrectly stating -1 for infinite
timeout. That timeout can't be infinite. - Fix CMake pkg-config cURL require and use
pkg-configRequires.private
field (@FantasqueX, @stertingen, #4180). - Fixes certain cases where polling would not keep the consumer
in the group or make it rejoin it (#4256). - Fix to the C++ set_leader_epoch method of TopicPartitionImpl,
that wasn't storing the passed value (@pavel-pimenov, #4267).
Fixes
Consumer fixes
- Duplicate messages can be emitted when a fetch response is received
in the middle of an offset validation request. Solved by avoiding
a restart from last application offset when offset validation succeeds. - When fetching from follower, if the partition lease expires after 5 minutes,
and a list offsets operation was requested to retrieve the earliest
or latest offset, it resulted in segmentation fault. This was fixed by
allowing threads different from the main one to call
therd_kafka_toppar_set_fetch_state
function, given they hold
the lock on therktp
. - In v2.1.0, a bug was fixed which caused polling any queue to reset the
max.poll.interval.ms
. Only certain functions were made to reset the timer,
but it is possible for the user to obtain the queue with messages from
the broker, skipping these functions. This was fixed by encoding information
in a queue itself, that, whether polling, resets the timer.
Checksums
Release asset checksums:
- v2.1.1.zip SHA256
3b8a59f71e22a8070e0ae7a6b7ad7e90d39da8fddc41ce6c5d596ee7f5a4be4b
- v2.1.1.tar.gz SHA256
7be1fc37ab10ebdc037d5c5a9b35b48931edafffae054b488faaff99e60e0108
v2.1.0
librdkafka v2.1.0 is a feature release:
- KIP-320
Allow fetchers to detect and handle log truncation (#4122). - Fix a reference count issue blocking the consumer from closing (#4187).
- Fix a protocol issue with ListGroups API, where an extra
field was appended for API Versions greater than or equal to 3 (#4207). - Fix an issue with
max.poll.interval.ms
, where polling any queue would cause
the timeout to be reset (#4176). - Fix seek partition timeout, was one thousand times lower than the passed
value (#4230). - Fix multiple inconsistent behaviour in batch APIs during pause or resume operations (#4208).
See Consumer fixes section below for more information. - Update lz4.c from upstream. Fixes CVE-2021-3520
(by @filimonov, #4232). - Upgrade OpenSSL to v3.0.8 with various security fixes,
check the release notes (#4215).
Enhancements
- Added
rd_kafka_topic_partition_get_leader_epoch()
(andset..()
). - Added partition leader epoch APIs:
rd_kafka_topic_partition_get_leader_epoch()
(andset..()
)rd_kafka_message_leader_epoch()
rd_kafka_*assign()
andrd_kafka_seek_partitions()
now supports
partitions with a leader epoch set.rd_kafka_offsets_for_times()
will return per-partition leader-epochs.leader_epoch
,stored_leader_epoch
, andcommitted_leader_epoch
added to per-partition statistics.
Fixes
OpenSSL fixes
- Fixed OpenSSL static build not able to use external modules like FIPS
provider module.
Consumer fixes
- A reference count issue was blocking the consumer from closing.
The problem would happen when a partition is lost, because forcibly
unassigned from the consumer or if the corresponding topic is deleted. - When using
rd_kafka_seek_partitions
, the remaining timeout was
converted from microseconds to milliseconds but the expected unit
for that parameter is microseconds. - Fixed known issues related to Batch Consume APIs mentioned in v2.0.0
release notes. - Fixed
rd_kafka_consume_batch()
andrd_kafka_consume_batch_queue()
intermittently updatingapp_offset
andstore_offset
incorrectly when
pause or resume was being used for a partition. - Fixed
rd_kafka_consume_batch()
andrd_kafka_consume_batch_queue()
intermittently skipping offsets when pause or resume was being
used for a partition.
Known Issues
Consume Batch API
- When
rd_kafka_consume_batch()
andrd_kafka_consume_batch_queue()
APIs are used with
any of the seek, pause, resume or rebalancing operation,on_consume
interceptors might be called incorrectly (maybe multiple times) for not consumed messages.
Consume API
- Duplicate messages can be emitted when a fetch response is received
in the middle of an offset validation request. - Segmentation fault when subscribing to a non-existent topic and
callingrd_kafka_message_leader_epoch()
on the polledrkmessage
.
Checksums
Release asset checksums:
- v2.1.0.zip SHA256
2fe898f9f5e2b287d26c5f929c600e2772403a594a691e0560a2a1f2706edf57
- v2.1.0.tar.gz SHA256
d8e76c4b1cde99e283a19868feaaff5778aa5c6f35790036c5ef44bc5b5187aa
v2.0.2
librdkafka v2.0.2 is a bugfix release:
- Fix OpenSSL version in Win32 nuget package (#4152).
Checksums
Release asset checksums:
- v2.0.2.zip SHA256
87010c722111539dc3c258a6be0c03b2d6d4a607168b65992eb0076c647e4e9d
- v2.0.2.tar.gz SHA256
f321bcb1e015a34114c83cf1aa7b99ee260236aab096b85c003170c90a47ca9d
v2.0.1
librdkafka v2.0.1 is a bugfix release:
- Fixed nuget package for Linux ARM64 release (#4150).
Checksums
Release asset checksums:
- v2.0.1.zip SHA256
7121df3fad1f72ea1c42dcc4e5367337207a75966216c63e58222c6433c528e0
- v2.0.1.tar.gz SHA256
3670f8d522e77f79f9d09a22387297ab58d1156b22de12ef96e58b7d57fca139