Releases: redpanda-data/redpanda
Releases · redpanda-data/redpanda
v24.3.18
Bug Fixes
- Fixed an issue with consumer groups with manually assigned consumers. When an OffsetDeleteRequest was sent on such a group, a GROUP_SUBSCRIBED_TO_TOPIC error was returned. by @IoannisRP in #26705
- Increase the default self check timeout from 5s to 10s to leave time to retry DNS lookups if they time out during a self check operation. by @pgellert in #26775
- Resolves a memory leak scenario in node_status_backend by resetting connections which make no progress by @joe-redpanda in #26802
- Return the correct error response if the RPC to the leader for deleting ACLs fails. by @BenPope in #26785
- This fixes a bug in Redpanda's self-check functionality, where the self-check would occasionally fail with 'Uploaded key/payload could not be found in cloud storage item list.' despite the object being successfully uploaded. This issue occurred when testing against an Azure ABS tiered storage endpoint. by @pgellert in #26728
- #26739 Fixes a bug in which a
segment
produced by adjacent merge compaction did not have its batch cache reset, leading to potentially stale reads in thestorage
layer. by @WillemKauf in #26741 - #26820 Fixes a bug where data loss could occur during FPM w/ tiered storage disabled by @oleiman in #26821
Improvements
- Make segment download timeouts configurable in cloud cache hydration by @oleiman in #26779
- PR #26744 [v24.3.x] [CORE-12729] debug/bundle: forward kubernetes env vars to rpk by @IoannisRP
- PR #26815 [v24.3.x] [CORE-8805] dt/archival: Decrease manifest upload interval to avoid race by @oleiman
- PR #26769 [v24.3.x] raft/heartbeat_manager: timeout memory leak fix by @joe-redpanda
Full Changelog: v24.3.17...v24.3.18
v24.2.27
Bug Fixes
- Resolves a memory leak scenario in node_status_backend by resetting connections which make no progress by @joe-redpanda in #26801
- Return the correct error response if the RPC to the leader for deleting ACLs fails. by @BenPope in #26784
- #26819 Fixes a bug where data loss could occur during FPM w/ tiered storage disabled by @oleiman in #26822
- PR #26817 [v24.2.x] [CORE-8805] dt/archival: Decrease manifest upload interval to avoid race by @oleiman
- PR #26770 [v24.2.x] raft/heartbeat_manager: timeout memory leak fix by @joe-redpanda
Full Changelog: v24.2.26...v24.2.27
v25.1.8
Bug Fixes
- Fixed an issue with consumer groups with manually assigned consumers. When an OffsetDeleteRequest was sent on such a group, a GROUP_SUBSCRIBED_TO_TOPIC error was returned. by @IoannisRP in #26704
- Fixes a bug where data loss could occur during FPM w/ tiered storage disabled by @oleiman in #26818
- Fixes a hang in RPC dispatch that may result in failed replication and leadership transfers. by @bharathv in #26805
- Increase the default self check timeout from 5s to 10s to leave time to retry DNS lookups if they time out during a self check operation. by @pgellert in #26776
- Resolves a memory leak scenario in node_status_backend by resetting connections which make no progress by @joe-redpanda in #26800
- Return the correct error response if the RPC to the leader for deleting ACLs fails. by @BenPope in #26783
- This fixes a bug in Redpanda's self-check functionality, where the self-check would occasionally fail with 'Uploaded key/payload could not be found in cloud storage item list.' despite the object being successfully uploaded. This issue occurred when testing against an Azure ABS tiered storage endpoint. by @pgellert in #26713
- #26738 Fixes a bug in which a
segment
produced by adjacent merge compaction did not have its batch cache reset, leading to potentially stale reads in thestorage
layer. by @WillemKauf in #26740
Improvements
- Fall back to the previously uploaded cluster manifest's group offset snapshot if uploading the group offsets fails for a consumer offsets topic partition. by @pgellert in #26793
- Make segment download timeouts configurable in cloud cache hydration by @oleiman in #26780
- PR #26641 [v25.1.x] [CORE-8392] http: Add shutdown connection error code by @Lazin
- PR #26699 [backport v25.1.x] iceberg/config: mark iceberg auth options as restored (default) by @wdberkeley
- PR #26734 [v25.1.x] [CORE-12729] debug/bundle: forward kubernetes env vars to rpk by @IoannisRP
- PR #26771 [v25.1.x] raft/heartbeat_manager: timeout memory leak fix by @joe-redpanda
Full Changelog: v25.1.7...v25.1.8
v25.1.7
Bug Fixes
Improvements
- PR #26649 [v25.1.x] Backport AWS Glue REST catalog support by @wdberkeley
- PR #26666 [v25.1.x] Added defensive checks when materializing batch records by @mmaslankaprv
- PR #26674 [v25.1.x] c/rm_frontend: more nuanced mapping of error when locking writes by @mmaslankaprv
- PR #26701 [backport v25.1.x] datalake: add default partition spec kludge for AWS Glue by @wdberkeley
Full Changelog: v25.1.6...v25.1.7
v25.1.6
Bug Fixes
- Fix Avro translation to Iceberg when root Avro schema is a primitive type. by @nvartolomei in #26461
- Fix an issue where audit log could lock down a cluster, if miss-configured. Now it is always possible to disable it. by @IoannisRP in #26652
- Iceberg integration: Encode avro record field names to avoid using disallowed characters (like dots with default partitioning
hour(redpanda.timestamp)
. This makes it possible to read Iceberg data with latest version of DuckDB. by @nvartolomei in #26535 - prevents Redpanda from crashing when reading invalid record data by @mmaslankaprv in #26492
Improvements
- Adds support for the Iceberg table properties
write.metadata.path
andwrite.data.path
. When an Iceberg catalog defines these properties, Redpanda will use them to determine where to write Iceberg table metadata and data, respectively, instead of using default locations based on the table location. by @wdberkeley in #26440 - Allows direct uploading of debug bundles collected with
rpk debug remote-bundle
by @JFlath in #26515 - Cut down the amount of time spent in
fstat()
syscalls during storage layer housekeeping & cut down the amount of time spent infstat()
syscalls in the storage layer EVEN MORE IN GENERAL! by @WillemKauf in #26656 - Fixed large allocation issues when handling OffsetCommits by @mmaslankaprv in #26414
- Fixes an issue in which users could experience oversized allocations during a
DescribeGroup
request. by @WillemKauf in #26532 - ability to control batch cache settings for
__consumer_offsets
topic by @mmaslankaprv in #26558
Full Changelog: v25.1.5...v25.1.6
v24.3.17
Bug Fixes
- Fix an issue where audit log could lock down a cluster, if miss-configured. Now it is always possible to disable it. by @IoannisRP in #26651
- Fixes unbounded memory usage in some transaction use caes by @bharathv in #26682
Improvements
- ability to control batch cache settings for
__consumer_offsets
topic by @mmaslankaprv in #26658 - PR #25397 [v24.3.x] [CORE-8946] cloud_storage: Update process_anomalies method by @Lazin
- PR #26177 [v24.3.x] r/consensus: do not block leadership completely in maintenance mode by @mmaslankaprv
- PR #26526 [v24.3.x] ducktape: Respect rpk timeout in rpk by @StephanDollberg
- PR #26548 [v24.3.x] [CORE-12155] Introduce external timeout for cloud_storage client leases by @oleiman
- PR #26554 [v24.3.x] Fix race between bootstrap and shutdown by @bashtanov
- PR #26578 [v24.3.x] csc/client_pool: Add null checks in lease watchdog handler by @oleiman
- PR #26599 [v24.3.x] kc/consumer: fixed resource leak when coordinator changes by @mmaslankaprv
- PR #26638 [v24.3.x] raft: Handle exceptions in backgroun_apply_fiber by @Lazin
- PR #26640 [v24.3.x] [CORE-8392] http: Add shutdown connection error code by @Lazin
- PR #26665 [v24.3.x] Added defensive checks when materializing batch records by @mmaslankaprv
Full Changelog: v24.3.16...v24.3.17
v24.2.26
Bug Fixes
- Fixes unbounded memory usage in some transaction use caes by @bharathv in #26683
- prevents Redpanda from crashing when reading invalid record data by @mmaslankaprv in #26493
- PR #26433 [v24.2.x] storage: fix index state truncate overflow by @andrwng
Full Changelog: v24.2.25...v24.2.26
v24.3.16
Bug Fixes
- prevents Redpanda from crashing when reading invalid record data by @mmaslankaprv in #26494
Improvements
- Allows direct uploading of debug bundles collected with
rpk debug remote-bundle
by @JFlath in #26514 - Fixes an issue in which users could experience oversized allocations during a
DescribeGroup
request. by @WillemKauf in #26531 - PR #26444 [v24.3.x]
storage
: callreserve()
instorage::range()
by @WillemKauf
Full Changelog: v24.3.15...v24.3.16
v24.3.15
Bug Fixes
- Allow partition balancing to opearte in case when space management was enabled, but local target capacity was unset. by @ztlpn in #26306
- Enable TCP keepalive for cloud storage connections. by @Lazin in #26409
- Fix Redpanda crash if
partition_autobalancing_concurrent_moves
was set to 0. by @ztlpn in #26306 - #26190 Fixes a bug in which a broker would crash during sliding window compaction when started with
log_compaction_use_sliding_window=false
and its value was later set totrue
without restarting. by @WillemKauf in #26196 partition_autobalancing_mode=off
now stops on-demand partition rebalance as well. by @ztlpn in #26306- PR #26082 [v24.3.x] archival: Fix archival_stm_snapshot installation by @Lazin
- PR #26277 [v24.3.x] r/consensus: stop consumable offset monitor by @mmaslankaprv
- PR #26307 [v24.3.x] Fix archival STM shutdown race by @bashtanov
- PR #26432 [v24.3.x] storage: fix index state truncate overflow by @andrwng
Improvements
- Improved handling rf=1 partitions health reporting by @mmaslankaprv in #26178
- In AlterPartitionReassignmentsResponse per-partition response REASSIGNMENT_IN_PROGRESS error code is used if a reassignment is requested while Partition Balancer is moving partition replicas. by @bashtanov in #26349
- Made it easier to detect and diagnose node operation issues by @mmaslankaprv in #26176
- #26202 Adds the
storage_log_adjacent_segments_compacted
metric for better observability into adjacent segment compaction. by @WillemKauf in #26204 - #26252 rpk:
decommission-status
reports reallocation failure details by @daisukebe in #26263 - rpk debug bundle: improve reliability of debug bundle collection in k8s environments. by @r-vasquez in #26163
- PR #26108 [v24.3.x] make ntp_callbacks actually support multiple callbacks by @bashtanov
- PR #26173 [v24.3.x]
storage
: add_segment_cleanly_compacted
toprobe
(Manual backport) by @WillemKauf - PR #26176 [v24.3.x] Decommission status improvements by @mmaslankaprv
- PR #26223 [v24.3.x] c/archival: wakeup upload loop after flush by @ztlpn
- PR #26230 [v24.3.x]
storage
: output fullsegment
inWARN
log inoffset_to_filepos.cc
by @WillemKauf - PR #26339 [v24.3.x] c/partition_manager: added log entries when partition is being shutdown by @mmaslankaprv
Full Changelog: v24.3.14...v24.3.15
v24.2.25
Bug Fixes
- Allow partition balancing to opearte in case when space management was enabled, but local target capacity was unset. by @ztlpn in #26305
- Enable TCP keepalive for cloud storage connections. by @Lazin in #26410
- Fix Redpanda crash if
partition_autobalancing_concurrent_moves
was set to 0. by @ztlpn in #26305 - When Tiered Storage is paused and data is allowed to expire from local storage there will be gaps between last offset in tiered storage and first offset in local storage. If local storage was truncated in the middle of a segment (i.e. time based retention or via trim-prefix/delete records commands) tiered storage might get stuck with the following exception:
Failed to schedule upload: std::runtime_error (ntp {kafka/foo/0}: log offset N is outside the translation range (starting at M > N))
. Fix this by adjusting upload start offset to the first available and valid offset. Although we might have a bit more data in the segment, other information about that data (i.e. offset translation) is gone with prefix truncation. by @nvartolomei in #26064 partition_autobalancing_mode=off
now stops on-demand partition rebalance as well. by @ztlpn in #26305
Improvements
- In AlterPartitionReassignmentsResponse per-partition response REASSIGNMENT_IN_PROGRESS error code is used if a reassignment is requested while Partition Balancer is moving partition replicas. by @bashtanov in #26350
- #26135 Swap out an internal data structure in the
storage
layer to prevent oversized allocations and crashes when a large number ofsegment
s are present in apartition
. by @WillemKauf in #26138 rpk transform
now uses the tinygo v37 to compile golang to Wasm. by @r-vasquez in #26217- rpk debug bundle: improve reliability of debug bundle collection in k8s environments. by @r-vasquez in #26214
- PR #26010 [v24.2.x] create STMs based on original topic cfg by @bashtanov
- PR #26340 [v24.2.x] c/partition_manager: added log entries when partition is being shutdown by @mmaslankaprv
- PR #26344 [v24.2.x] raft/test/leadership_transfer_delay: increase tolerance by @bashtanov
- PR #26345 [v24.2.x] make ntp_callbacks actually support multiple callbacks by @bashtanov
- PR #26354 Revert "[v24.2.x] raft/log_eviction_stm: avoid unnecessary wait on visible offset" by @bharathv
- PR #26420 [v24.2.x] Fix archival STM shutdown race by @bashtanov
- PR #26428 [backport] [v24.2.x] raft/c: warn on struck truncation. by @bharathv
- PR #26442 [v24.2.x]
storage
: callreserve()
instorage::range()
by @WillemKauf
Full Changelog: v24.2.24...v24.2.25