Skip to content

Releases: redpanda-data/redpanda

v24.3.18

17 Jul 07:01
7107539
Compare
Choose a tag to compare

Bug Fixes

  • Fixed an issue with consumer groups with manually assigned consumers. When an OffsetDeleteRequest was sent on such a group, a GROUP_SUBSCRIBED_TO_TOPIC error was returned. by @IoannisRP in #26705
  • Increase the default self check timeout from 5s to 10s to leave time to retry DNS lookups if they time out during a self check operation. by @pgellert in #26775
  • Resolves a memory leak scenario in node_status_backend by resetting connections which make no progress by @joe-redpanda in #26802
  • Return the correct error response if the RPC to the leader for deleting ACLs fails. by @BenPope in #26785
  • This fixes a bug in Redpanda's self-check functionality, where the self-check would occasionally fail with 'Uploaded key/payload could not be found in cloud storage item list.' despite the object being successfully uploaded. This issue occurred when testing against an Azure ABS tiered storage endpoint. by @pgellert in #26728
  • #26739 Fixes a bug in which a segment produced by adjacent merge compaction did not have its batch cache reset, leading to potentially stale reads in the storage layer. by @WillemKauf in #26741
  • #26820 Fixes a bug where data loss could occur during FPM w/ tiered storage disabled by @oleiman in #26821

Improvements

  • Make segment download timeouts configurable in cloud cache hydration by @oleiman in #26779
  • PR #26744 [v24.3.x] [CORE-12729] debug/bundle: forward kubernetes env vars to rpk by @IoannisRP
  • PR #26815 [v24.3.x] [CORE-8805] dt/archival: Decrease manifest upload interval to avoid race by @oleiman
  • PR #26769 [v24.3.x] raft/heartbeat_manager: timeout memory leak fix by @joe-redpanda

Full Changelog: v24.3.17...v24.3.18

v24.2.27

16 Jul 12:49
754dc2c
Compare
Choose a tag to compare

Bug Fixes

  • Resolves a memory leak scenario in node_status_backend by resetting connections which make no progress by @joe-redpanda in #26801
  • Return the correct error response if the RPC to the leader for deleting ACLs fails. by @BenPope in #26784
  • #26819 Fixes a bug where data loss could occur during FPM w/ tiered storage disabled by @oleiman in #26822
  • PR #26817 [v24.2.x] [CORE-8805] dt/archival: Decrease manifest upload interval to avoid race by @oleiman
  • PR #26770 [v24.2.x] raft/heartbeat_manager: timeout memory leak fix by @joe-redpanda

Full Changelog: v24.2.26...v24.2.27

v25.1.8

15 Jul 16:28
bb0d2f4
Compare
Choose a tag to compare

Bug Fixes

  • Fixed an issue with consumer groups with manually assigned consumers. When an OffsetDeleteRequest was sent on such a group, a GROUP_SUBSCRIBED_TO_TOPIC error was returned. by @IoannisRP in #26704
  • Fixes a bug where data loss could occur during FPM w/ tiered storage disabled by @oleiman in #26818
  • Fixes a hang in RPC dispatch that may result in failed replication and leadership transfers. by @bharathv in #26805
  • Increase the default self check timeout from 5s to 10s to leave time to retry DNS lookups if they time out during a self check operation. by @pgellert in #26776
  • Resolves a memory leak scenario in node_status_backend by resetting connections which make no progress by @joe-redpanda in #26800
  • Return the correct error response if the RPC to the leader for deleting ACLs fails. by @BenPope in #26783
  • This fixes a bug in Redpanda's self-check functionality, where the self-check would occasionally fail with 'Uploaded key/payload could not be found in cloud storage item list.' despite the object being successfully uploaded. This issue occurred when testing against an Azure ABS tiered storage endpoint. by @pgellert in #26713
  • #26738 Fixes a bug in which a segment produced by adjacent merge compaction did not have its batch cache reset, leading to potentially stale reads in the storage layer. by @WillemKauf in #26740

Improvements

  • Fall back to the previously uploaded cluster manifest's group offset snapshot if uploading the group offsets fails for a consumer offsets topic partition. by @pgellert in #26793
  • Make segment download timeouts configurable in cloud cache hydration by @oleiman in #26780
  • PR #26641 [v25.1.x] [CORE-8392] http: Add shutdown connection error code by @Lazin
  • PR #26699 [backport v25.1.x] iceberg/config: mark iceberg auth options as restored (default) by @wdberkeley
  • PR #26734 [v25.1.x] [CORE-12729] debug/bundle: forward kubernetes env vars to rpk by @IoannisRP
  • PR #26771 [v25.1.x] raft/heartbeat_manager: timeout memory leak fix by @joe-redpanda

Full Changelog: v25.1.7...v25.1.8

v25.1.7

04 Jul 15:02
a66e19e
Compare
Choose a tag to compare

Bug Fixes

  • Fixes unbounded memory usage in some transaction use caes by @bharathv in #26681

Improvements

  • PR #26649 [v25.1.x] Backport AWS Glue REST catalog support by @wdberkeley
  • PR #26666 [v25.1.x] Added defensive checks when materializing batch records by @mmaslankaprv
  • PR #26674 [v25.1.x] c/rm_frontend: more nuanced mapping of error when locking writes by @mmaslankaprv
  • PR #26701 [backport v25.1.x] datalake: add default partition spec kludge for AWS Glue by @wdberkeley

Full Changelog: v25.1.6...v25.1.7

v25.1.6

03 Jul 11:30
065cf80
Compare
Choose a tag to compare

Bug Fixes

  • Fix Avro translation to Iceberg when root Avro schema is a primitive type. by @nvartolomei in #26461
  • Fix an issue where audit log could lock down a cluster, if miss-configured. Now it is always possible to disable it. by @IoannisRP in #26652
  • Iceberg integration: Encode avro record field names to avoid using disallowed characters (like dots with default partitioning hour(redpanda.timestamp). This makes it possible to read Iceberg data with latest version of DuckDB. by @nvartolomei in #26535
  • prevents Redpanda from crashing when reading invalid record data by @mmaslankaprv in #26492

Improvements

  • Adds support for the Iceberg table properties write.metadata.path and write.data.path. When an Iceberg catalog defines these properties, Redpanda will use them to determine where to write Iceberg table metadata and data, respectively, instead of using default locations based on the table location. by @wdberkeley in #26440
  • Allows direct uploading of debug bundles collected with rpk debug remote-bundle by @JFlath in #26515
  • Cut down the amount of time spent in fstat() syscalls during storage layer housekeeping & cut down the amount of time spent in fstat() syscalls in the storage layer EVEN MORE IN GENERAL! by @WillemKauf in #26656
  • Fixed large allocation issues when handling OffsetCommits by @mmaslankaprv in #26414
  • Fixes an issue in which users could experience oversized allocations during a DescribeGroup request. by @WillemKauf in #26532
  • ability to control batch cache settings for __consumer_offsets topic by @mmaslankaprv in #26558

Full Changelog: v25.1.5...v25.1.6

v24.3.17

04 Jul 02:58
8315307
Compare
Choose a tag to compare

Bug Fixes

  • Fix an issue where audit log could lock down a cluster, if miss-configured. Now it is always possible to disable it. by @IoannisRP in #26651
  • Fixes unbounded memory usage in some transaction use caes by @bharathv in #26682

Improvements

  • ability to control batch cache settings for __consumer_offsets topic by @mmaslankaprv in #26658
  • PR #25397 [v24.3.x] [CORE-8946] cloud_storage: Update process_anomalies method by @Lazin
  • PR #26177 [v24.3.x] r/consensus: do not block leadership completely in maintenance mode by @mmaslankaprv
  • PR #26526 [v24.3.x] ducktape: Respect rpk timeout in rpk by @StephanDollberg
  • PR #26548 [v24.3.x] [CORE-12155] Introduce external timeout for cloud_storage client leases by @oleiman
  • PR #26554 [v24.3.x] Fix race between bootstrap and shutdown by @bashtanov
  • PR #26578 [v24.3.x] csc/client_pool: Add null checks in lease watchdog handler by @oleiman
  • PR #26599 [v24.3.x] kc/consumer: fixed resource leak when coordinator changes by @mmaslankaprv
  • PR #26638 [v24.3.x] raft: Handle exceptions in backgroun_apply_fiber by @Lazin
  • PR #26640 [v24.3.x] [CORE-8392] http: Add shutdown connection error code by @Lazin
  • PR #26665 [v24.3.x] Added defensive checks when materializing batch records by @mmaslankaprv

Full Changelog: v24.3.16...v24.3.17

v24.2.26

03 Jul 21:26
ce19532
Compare
Choose a tag to compare

Bug Fixes

  • Fixes unbounded memory usage in some transaction use caes by @bharathv in #26683
  • prevents Redpanda from crashing when reading invalid record data by @mmaslankaprv in #26493
  • PR #26433 [v24.2.x] storage: fix index state truncate overflow by @andrwng

Full Changelog: v24.2.25...v24.2.26

v24.3.16

24 Jun 04:19
88ea42b
Compare
Choose a tag to compare

Bug Fixes

Improvements

  • Allows direct uploading of debug bundles collected with rpk debug remote-bundle by @JFlath in #26514
  • Fixes an issue in which users could experience oversized allocations during a DescribeGroup request. by @WillemKauf in #26531
  • PR #26444 [v24.3.x] storage: call reserve() in storage::range() by @WillemKauf

Full Changelog: v24.3.15...v24.3.16

v24.3.15

18 Jun 04:29
0d70730
Compare
Choose a tag to compare

Bug Fixes

  • Allow partition balancing to opearte in case when space management was enabled, but local target capacity was unset. by @ztlpn in #26306
  • Enable TCP keepalive for cloud storage connections. by @Lazin in #26409
  • Fix Redpanda crash if partition_autobalancing_concurrent_moves was set to 0. by @ztlpn in #26306
  • #26190 Fixes a bug in which a broker would crash during sliding window compaction when started with log_compaction_use_sliding_window=false and its value was later set to true without restarting. by @WillemKauf in #26196
  • partition_autobalancing_mode=off now stops on-demand partition rebalance as well. by @ztlpn in #26306
  • PR #26082 [v24.3.x] archival: Fix archival_stm_snapshot installation by @Lazin
  • PR #26277 [v24.3.x] r/consensus: stop consumable offset monitor by @mmaslankaprv
  • PR #26307 [v24.3.x] Fix archival STM shutdown race by @bashtanov
  • PR #26432 [v24.3.x] storage: fix index state truncate overflow by @andrwng

Improvements

  • Improved handling rf=1 partitions health reporting by @mmaslankaprv in #26178
  • In AlterPartitionReassignmentsResponse per-partition response REASSIGNMENT_IN_PROGRESS error code is used if a reassignment is requested while Partition Balancer is moving partition replicas. by @bashtanov in #26349
  • Made it easier to detect and diagnose node operation issues by @mmaslankaprv in #26176
  • #26202 Adds the storage_log_adjacent_segments_compacted metric for better observability into adjacent segment compaction. by @WillemKauf in #26204
  • #26252 rpk: decommission-status reports reallocation failure details by @daisukebe in #26263
  • rpk debug bundle: improve reliability of debug bundle collection in k8s environments. by @r-vasquez in #26163
  • PR #26108 [v24.3.x] make ntp_callbacks actually support multiple callbacks by @bashtanov
  • PR #26173 [v24.3.x] storage: add _segment_cleanly_compacted to probe (Manual backport) by @WillemKauf
  • PR #26176 [v24.3.x] Decommission status improvements by @mmaslankaprv
  • PR #26223 [v24.3.x] c/archival: wakeup upload loop after flush by @ztlpn
  • PR #26230 [v24.3.x] storage: output full segment in WARN log in offset_to_filepos.cc by @WillemKauf
  • PR #26339 [v24.3.x] c/partition_manager: added log entries when partition is being shutdown by @mmaslankaprv

Full Changelog: v24.3.14...v24.3.15

v24.2.25

18 Jun 14:39
f49a910
Compare
Choose a tag to compare

Bug Fixes

  • Allow partition balancing to opearte in case when space management was enabled, but local target capacity was unset. by @ztlpn in #26305
  • Enable TCP keepalive for cloud storage connections. by @Lazin in #26410
  • Fix Redpanda crash if partition_autobalancing_concurrent_moves was set to 0. by @ztlpn in #26305
  • When Tiered Storage is paused and data is allowed to expire from local storage there will be gaps between last offset in tiered storage and first offset in local storage. If local storage was truncated in the middle of a segment (i.e. time based retention or via trim-prefix/delete records commands) tiered storage might get stuck with the following exception: Failed to schedule upload: std::runtime_error (ntp {kafka/foo/0}: log offset N is outside the translation range (starting at M > N)). Fix this by adjusting upload start offset to the first available and valid offset. Although we might have a bit more data in the segment, other information about that data (i.e. offset translation) is gone with prefix truncation. by @nvartolomei in #26064
  • partition_autobalancing_mode=off now stops on-demand partition rebalance as well. by @ztlpn in #26305

Improvements

  • In AlterPartitionReassignmentsResponse per-partition response REASSIGNMENT_IN_PROGRESS error code is used if a reassignment is requested while Partition Balancer is moving partition replicas. by @bashtanov in #26350
  • #26135 Swap out an internal data structure in the storage layer to prevent oversized allocations and crashes when a large number of segments are present in a partition. by @WillemKauf in #26138
  • rpk transform now uses the tinygo v37 to compile golang to Wasm. by @r-vasquez in #26217
  • rpk debug bundle: improve reliability of debug bundle collection in k8s environments. by @r-vasquez in #26214
  • PR #26010 [v24.2.x] create STMs based on original topic cfg by @bashtanov
  • PR #26340 [v24.2.x] c/partition_manager: added log entries when partition is being shutdown by @mmaslankaprv
  • PR #26344 [v24.2.x] raft/test/leadership_transfer_delay: increase tolerance by @bashtanov
  • PR #26345 [v24.2.x] make ntp_callbacks actually support multiple callbacks by @bashtanov
  • PR #26354 Revert "[v24.2.x] raft/log_eviction_stm: avoid unnecessary wait on visible offset" by @bharathv
  • PR #26420 [v24.2.x] Fix archival STM shutdown race by @bashtanov
  • PR #26428 [backport] [v24.2.x] raft/c: warn on struck truncation. by @bharathv
  • PR #26442 [v24.2.x] storage: call reserve() in storage::range() by @WillemKauf

Full Changelog: v24.2.24...v24.2.25