Releases · StarRocks/starrocks · GitHub

21 Jul 02:11

yingtingdong

3.5.2 Latest

Latest

Release Date: July 18, 2025

Improvements

Collected NDV (number of distinct values) statistics for ARRAY columns to improve query plan accuracy. #60623
Disabled replica balancing for Colocate tables and tablet scheduling in Shared-data clusters to reduce unnecessary log output. #60737
Optimized Catalog access workflow: FE now delays accessing external data sources asynchronously at startup to prevent hanging due to external service unavailability. #60614
Added session variable enable_predicate_expr_reuse to control predicate pushdown. #60603
Supports a retry mechanism when fetching Kafka partition information fails. #60513
Removed the restriction requiring exact mapping of partition columns between materialized views and base tables. #60565
Supports building Runtime In-Filters to enhance aggregation performance by filtering data during aggregation. #59288

Bug Fixes

Fixed the following issues:

COUNT DISTINCT queries crash due to low-cardinality optimization for multiple columns. #60664
Incorrect matching of global UDFs when multiple functions share the same name. #60550
Null pointer exception (NPE) issue during Stream Load import. #60755
Null pointer exception (NPE) issue when starting FE during a recovery from a cluster snapshot. #60604
BE crash caused by column mode mismatch when processing short-circuit queries with out-of-order values. #60466
Session variables set via PROPERTIES in SUBMIT TASK statements did not take effect. #60584
Incorrect results for SELECT min/max queries under specific conditions. #60601
Incorrect bucket pruning when the left side of a predicate is a function, leading to incorrect query results. #60467
Crash for queries against a non-existent query_id via Arrow Flight SQL. #60497

Behavior Changes

The default value of lake_compaction_allow_partial_success is set to true. Compaction operations can now be marked as successful even if partially completed, preventing blockage of subsequent compaction tasks. #60643

Assets 2

11 Jul 05:54

jaogoy

3.4.5

Release Date: July 10, 2025

Improvements

Enhanced observability of loading job execution: Unified the runtime information of loading tasks into the information_schema.loads view. Users can view the execution details of all INSERT, Broker Load, Stream Load, and Routine Load subtasks in this view. Additional fields have been added to help users better understand the status of loading tasks and the association with parent jobs (PIPES, Routine Load Jobs).
Support modifying kafka_broker_list via the ALTER ROUTINE LOAD statement.

Bug Fixes

The following issues have been fixed:

Under high-frequency loading scenarios, Compaction could be delayed. #59998
Querying Iceberg external tables via Unified Catalog would throw an error: not support getting unified metadata table factory. #59412
When using DESC FILES() to view CSV files in remote storage, incorrect results were returned because the system mistakenly inferred xinf as the FLOAT type. #59574
INSERT INTO could cause BE to crash when encountering empty partitions. #59553
When StarRocks reads Equality Delete files in Iceberg, it could still access deleted data if the data had already been removed from the Iceberg table. #59709
Query failures caused by renaming columns. #59178

Behavior Changes

The default value of the BE configuration item skip_pk_preload has been changed from false to true. As a result, the system will skip preloading Primary Key Indexes for Primary Key tables to reduce the likelihood of Reached Timeout errors. This change may increase query latency for operations that require loading Primary Key Indexes.

Assets 2

02 Jul 02:19

yingtingdong

v3.5.1

Release Date: July 1, 2025

New Features

[Experimental] Starting from v3.5.1, StarRocks introduces a high-performance data transfer channel based on the Apache Arrow Flight SQL protocol, comprehensively optimizing the data import channel and significantly improving transfer efficiency. This solution establishes a fully columnar data transfer pipeline from the StarRocks columnar execution engine to the client, eliminating the frequent row-column conversions and serialization overhead typically seen in traditional JDBC and ODBC interfaces, and achieving true zero-copy, low-latency, and high-throughput data transfer capabilities. #57956
Java Scalar UDFs (user-defined functions) now support ARRAY and MAP types as input parameters. #55356
Cross-node data cache sharing: Enables nodes to share cached external table data of data lakes across compute nodes via the network. If a local cache miss occurs, the system first attempts to fetch data from the caches of other nodes within the same cluster. Only if all caches miss will it re-fetch data from remote storage. This feature effectively reduces performance jitter caused by cache invalidation during elastic scaling and ensures stable query performance. A new FE configuration parameter enable_trace_historical_node controls this behavior (Default: false). #57083
Storage Volume adds native support for Google Cloud Storage (GCS): You can now use GCS as a backend storage volume and manage and access GCS resources through the native SDK. #58815

Improvements

Optimized error messages when creating Hive external tables fails. #60076
Optimized count(1) query performance using the file_record_count in Iceberg metadata. #60022
Refined the Compaction scheduling logic to avoid delayed scheduling when all subtasks succeed. #59998
Added JAVA_OPTS="--add-opens=java.base/java.util=ALL-UNNAMED" to BE and CN after upgrading to JDK 17. #59947
Supports modifying the kafka_broker_list property via the ALTER ROUTINE LOAD command when Kafka Broker endpoints change. #59787
Supports reducing build dependencies of the Docker base image through parameters. #59772
Supports accessing Azure using Managed Identity authentication. #59657
Improved error messages when querying external data via Files() function with duplicate path column names. #59597
Optimized LIMIT pushdown logic. #59265

Bug Fixes

Fixed the following issues:

Partition pruning issue when queries include Max and Min aggregations and empty partitions. #60162
Incorrect query results when rewriting queries with materialized views due to missing NULL partitions. #60087
Refresh errors on Iceberg external tables when using partition expressions based on str2date. #60089
Incorrect partition range when creating temporary partitions using the START END syntax. #60014
Incorrect display of Routine Load metrics on non-leader FE nodes. #59985
BE/CN crashes when executing queries containing COUNT(*) window functions. #60003
Stream Load failures when the target table name contains Chinese characters. #59722
Overall loading failures to triple-replica tables when loading to a secondary replica fails. #59762
Missing parameters in SHOW CREATE VIEW output. #59714

Behavior Changes

Some FE metrics include the is_leader label. #59883

Assets 2

04 Jul 08:58

wangsimo0

3.3.16

3.3.16

Release Date: July 4, 2025

Improvements

Optimized error logs when creating Hive tables with duplicate names. #60076
Added the FE parameter slow_lock_print_stack to prevent process stalls in large clusters when printing thread stacks. #59967
Reduced unnecessary locks during tablet scheduling. #59744

Bug Fixes

Fixed the following issues:

SplitOR fails to prune scan columns. #60223
Incorrect query plan for null-aware left anti joins. #60119
Incorrect query results when rewriting queries with materialized views due to missing NULL partitions. #60087
Partition pruning errors when tables contain empty partitions. #60162
Refresh errors on Iceberg external tables when using partition expressions based on str2date. #60089
Unexpected behavior caused by materialized view schema changes. #60079
Issues related to low-cardinality global dictionaries in UNION operators. #60075
Incorrect partition ranges for temporary partitions created using the START END syntax. #60014
Lock issues with SUBMIT TASK. #60026
Partial updates fail on Primary Key tables under certain conditions. #60052
Crashes caused by BE failing to create directories due to a lack of permissions to access storage paths. #60028
Cache failures due to cache key duplication in concurrent scenarios. #60053
Hive table metadata background refresh failure in Unified Catalog. #55215
Query failures caused by incorrect return types of CASE WHEN. #59972
Query failures when Delta Lake tables UNION themselves. #60030
Partition creation failure when writing to multiple tables within the same transaction. #59954
Queries could return empty results instead of errors when tablet versions were updated during execution. #53060
Queries against modified columns in a table return null after upgrading to v3.4. #59941
Authentication information is printed in logs. #59907
Metadata refresh failures for external tables in Hive Catalog. #54596
CACHE SELECT failures for tables after schema changes. #59812
Broker Load could not recover after FE Leader shifts. #59732
Stream Load failures when the target table name contains Chinese characters. #59722
Incorrect query results in external tables due to search key hash collisions (affecting Iceberg/Delta/Paimon). #59781

Assets 2

19 Jun 11:42

wangsimo0

3.3.15

3.3.15

Bug Fixes

Fixed the following issues:

Missing double quotes for string parameters in statistics INSERT statements. #59713
Downgrade failure caused by Rollup tasks. #59735
Incorrect function parameters in the result of SHOW CREATE VIEW. #59714
A security issue where SQL statements with syntax errors exposed sensitive information in the Audit Log. #59442
Error "Query version not found". #59194
Failure to change data distribution using the ALTER TABLE statement. #59360
An issue where root user processes were still visible when admin protection was enabled. #59435
Failure of INSERT OVERWRITE into Hive. #59469
Missing Tablet ID in the max_tablet_rowset_num log item. #59467
An error caused by misconfigured Persistent Index parameters on a Duplicate table. #56040
TaskRun history being archived on FE Follower nodes. #59393
External catalog-based materialized view refresh errors. #59369
Missing minimum version in Tablet information on shared-data clusters. #59373
Abnormal maximum column unique ID in native tables of shared-data clusters due to version compatibility logic errors. #59190
Materialized view refresh failure on Iceberg catalogs when the source Iceberg table is dropped and recreated, and manual refresh also fails after the materialized view is set to active. #59287
Contamination of parameters in materialized view refresh tasks. #59052
Data loss caused by Persistent Index when loading snapshot fails. #59247
Issues caused when subcolumns of STRUCT appear in multiple predicates. #59216
Query failure after renaming columns. #59178
Loading failure due to multiple Stream Load requests. #59181
Inability to refresh Hive table-based materialized views at the partition level in Unified Catalog. #59139
Incorrect UNION plan causing FE out-of-memory (OOM). #59030
Version loss during data loading. #59006
Predicate loss when queries are rewritten to synchronous materialized views. #58831
Issues with BITMAP/HLL/PERCENTILE data types in window functions. #58776
Metadata changes to the external tables in Hive Catalog cannot be refreshed. #54596

Behavior Changes

Introduced FE configuration parameter task_runs_max_history_number to control the number of historical TaskRuns retained in the information_schema.task_runs view, reducing memory usage. #59161

Assets 2

16 Jun 07:45

yingtingdong

3.5.0

Release Date: June 13, 2025

Upgrade Notes

JDK 17 or later is required from StarRocks v3.5.0 onwards.
- To upgrade a cluster from v3.4 or earlier, you must upgrade the version of JDK that StarRocks depends, and remove the options that are incompatible with JDK 17 in the configuration item JAVA_OPTS in the FE configuration file fe.conf, for example, options that involve CMS and GC. The default value of JAVA_OPTS in the v3.5 configuration file is recommended.
- For clusters using external catalogs, you need to add --add-opens=java.base/java.util=ALL-UNNAMED to the JAVA_OPTS configuration item in the BE configuration file be.conf.
- In addition, as of v3.5.0, StarRocks no longer provides JVM configurations for specific JDK versions. All versions of JDK use JAVA_OPTS.

Shared-data Enhancement

Shared-data clusters support generated columns. #53526
Cloud-native Primary Key tables in shared-data clusters support rebuilding specific indexes. The performance of the indexes is also optimized. #53971 #54178
Optimized the execution logic of large-scale data loading operations to avoid generating too many small files in Rowset due to memory limitations. During the import, the system will merge the temporary data blocks to reduce the generation of small files, which improves the query performance after the import and also reduces the subsequent Compaction operations to improve the system resource utilization. #53954

Data Lake Analytics

[Beta] Supports creating Iceberg views in the Iceberg Catalog with Hive Metastore integration. And supports adding or modifying the dialect of the Iceberg view using the ALTER VIEW statement for better syntax compatibility with external systems. #56120
Supports nested namespace for Iceberg REST Catalog. #58016
Supports using IcebergAwsClientFactory to create AWS clients in Iceberg REST Catalog to offer vended credentials. #58296
Parquet Reader supports filtering data with Bloom Filter. #56445
Supports automatically creating global dictionaries for low-cardinality columns in Parquet-formatted Hive/Iceberg tables during queries. #55167

Performance Improvement and Query Optimization

Statistics optimization:
- Supports Table Sample. Improved statistics accuracy and query performance by sampling data blocks in physical files. #52787
- Supports recording the predicate columns in queries for targeted statistics collection. #53204
- Supports partition-level cardinality estimation. The system reuses the system-defined view _statistics_.column_statistics to record the NDV of each partition. #51513
- Supports multi-column Joint NDV collection to optimize the query plan generated by CBO in the scenario where columns correlate with each other. #56481 #56715 #56766 #56836
- Supports using histograms to estimate the Join node cardinality and in_predicate selectivity, thus improving the estimation accuracy in data skew. #57874 #57639
- Optimized Query Feedback. Queries with the identical structure but different parameter values will be categorized as the same type and share the same tuning guide for plan execution optimization. #58306
Supports Runtime Bitset Filter as an alternative for optimization to Bloom Filter in specific scenarios. #57157
Supports pushing down Join Runtime Filter to the storage layer. #55124
Supports Pipeline Event Scheduler. #54259

Partition Management

Supports using ALTER TABLE to merge expression partitions based on time functions for optimized storage efficiency and query performance. #56840
Supports partition Time-to-live (TTL) for List-partitioned tables and materialized views. And supports the property partition_retention_condition in tables and materialized views to allow users to set data retention strategies for list partitions, thus achieving more flexible partition deletion strategies. #53117
Supports using ALTER TABLE to delete partitions specified by common partition expressions, allowing users to flexibly delete partitions in batches. #53118

Cluster Management

Upgraded FE compile target from Java 11 to Java 17 for better system stability and performance. #53617 #57030

Security and Authentication

Supports secure connections encrypted by SSL based on the MySQL protocol. #54877
Enhanced authentication using external systems:
- Supports creating StarRocks users with OAuth 2.0 and JSON Web Token (JWT).
- Supports Security Integration to simplify the authentication process with external systems. Security Integration supports LDAP, OAuth 2.0, and JWT. #55846
Supports Group Provider to obtain the user group information from external authentication services. The group information can then be used in authentication and authorization. Group Provider supports acquiring group information from LDAP, operating systems, or files. Users can query the user group they belong to using the function current_group(). #56670

Materialized Views

Supports creating materialized views with multiple partition columns to allow users to partition the data with a more flexible strategy. #52576
Supports setting query_rewrite_consistency to force_mv to force the system to use the materialized view for query rewrite, thus keeping performance stability at the cost of data timeliness to a certain extent. #53819

Loading and Unloading

Supports pausing Routine Load jobs on JSON parse errors by setting the property pause_on_json_parse_error to true. #56062
[Beta] Supports transactions with multiple SQL statements (currently, only INSERT is supported). Users can start, apply, or undo a transaction to guarantee the ACID (atomicity, consistency, isolation, and durability) properties of multiple loading operations. #53978

Functions

Introduced the system variable lower_upper_support_utf8 on the session and global level, enhancing the support for UTF-8 strings (especially non-ASCII characters) in case conversion functions such as upper() and lower(). #56192
Added new functions:
- field() #5533
- ds_theta_count_distinct() #56960
- array_flatten() #50080
- inet_aton() #51883
- [percentile_approx_weight()](https://docs.starrocks.io/docs/sql-reference/sql-functions/aggregate-functions/percentile...

Read more

Assets 2

10 Jun 10:11

jaogoy

3.4.4

Release Date: June 10, 2025

Improvements

Storage Volume now supports ADLS2 using Managed Identity as the credential. #58454
For partitions based on complex time function expressions, partition pruning works well for partitions based on most DATETIME-related functions
Supports loading Avro data files from Azure using the FILES function. #58131
When Routine Load encounters invalid JSON data, the consumed partition and offset information is logged in the error log to facilitate troubleshooting. #55772

Bug Fixes

The following issues have been fixed:

Concurrent queries accessing the same partition in a partitioned table caused Hive Metastore to hang. #58089
Abnormal termination of INSERT tasks caused the job to remain in the QUEUEING state. #58603
After upgrading the cluster from v3.4.0 to v3.4.2, a large number of tablet replicas encounter exceptions. #58518
FE OOM caused by incorrect UNION execution plans. #59040
Invalid database IDs during partition recycling could cause FE startup to fail. #59666
After a failed FE CheckPoint operation, the process could not exit properly, resulting in blocking. #58602

Assets 2

16 May 06:59

wangsimo0

3.3.14

Release Date: May 14, 2025

Improvements

Optimized error messages for regex parsing failures. #57904
Fixed security vulnerabilities SNYK-JAVA-ORGJSON-5488379 and SNYK-JAVA-ORGJSON-5962464. #58425

Bug Fixes

Fixed the following issues:

Issues with the JSON data type in first_value/last_value/lead/lag window functions. #58697
Deadlock caused by table-level locks from base tables during materialized view writes (after the bug fix, DB-level locks are used). #58615
INSERT tasks hang when the target table is deleted. #58603
Failure to change active/inactive state of materialized views with List partitions. #58575
Incorrect streaming_load_current_processing metric. #58565
Data version update errors caused by continuous loading and replica clone tasks. #58513
Failed to refresh materialized views on external tables. #58506
Incorrect if() results on ARM architecture. #58455
Materialized view rewriting generated incorrect query plans. #58487
Iceberg table metadata did not refresh automatically. #58490
Incorrect query plan generated by group_concat. #57908
Mass Tablet load failures caused by unhandled exceptions during loading. #58393
Constant folding failed due to type mismatches while pruning List partitions with generated columns (after the bug fix, an implicit cast rule was added). #54543
Mismatch between aggregate function return type and original column type (after the bug fix, the column type is cast to the function output type). #58407
broadcast_row_limit set to 0 or below failed to prevent BROADCAST JOIN generation. #58307
Broker Load used BE nodes that had already been blacklisted. #58350
Asynchronous tasks persist in the background and cannot be dropped after manually cancelling materialized view refresh tasks. #58310
Failed to create expression partitions with month or year granularity. #58182
ngram_search generated invalid query plans. #58190

Assets 2

10 Jun 10:12

jaogoy

3.4.3

Release Date: April 30, 2025

Improvements

Routine Load and Stream Load support the use of Lambda expressions in the columns parameter for complex column data extraction. array_filter/map_filter can be used to filter and extract ARRAY/MAP data. Complex filtering and extraction of JSON data can be achieved by combining the cast function to convert JSON array/JSON object to ARRAY and MAP types. For example, COLUMNS (js, col=array_filter(i -> json_query(i, '$.type')=='t1', cast(js as Array<JSON>))[1]) can extract the first JSON object from the JSON array js where type is t1. #58149
Supports converting JSON objects to MAP type using the cast function, combined with map_filter to extract items from the JSON object that meet specific conditions. For example, map_filter((k, v) -> json_query(v, '$.type') == 't1', cast(js AS MAP<String, JSON>)) can extract the JSON object from js where type is t1. #58045
LIMIT is now supported when querying the information_schema.task_runs view. #57404

Bug Fixes

The following issues have been fixed:

Queries against ORC format Hive tables are returned with an error OrcChunkReader::lazy_seek_to failed. reason = bad read in RleDecoderV2: :readByte. #57454
RuntimeFilter from the upper layer could not be pushed down when querying Iceberg tables that contain Equality Delete files. #57651
Enabling the spill-to-disk pre-aggregation strategy causes queries to fail. #58022
Queries are returned with an error ConstantRef-cmp-ConstantRef not supported here, null != 111 should be eliminated earlier. #57735
Query timeout with the query_queue_pending_timeout_second parameter while the Query Queue feature is not enabled. #57719

Assets 2

22 Apr 07:49

wangsimo0

3.3.13

3.3.13

Release Date: April 22, 2025

Improvements

Added memory consumption metrics for queries in FE in audit logs and the QueryDetail interface. #57731
Optimized the strategy for concurrent creation of expression partitions. #57899
Added monitoring metrics for the number of active FE nodes. #57857
The information_schema.task_runs view supports pushdown of the LIMIT clause. #57404
Fixed several CVE issues. #57705 #57620
Primary Key tables support retry during the PUBLISH stage, enhancing system disaster recovery capabilities. #57354
Reduced memory consumption of Flat JSON. #57357
The information_schema.routine_load_jobs view adds the timestamp_progress column, consistent with the SHOW ROUTINE LOAD statement return. #57123
Disallowed unauthorized behaviors from StarRocks to LDAP. #57131
Supports returning an error when the schema of an AVRO file does not match the schema of the Hive table. #57296
Materialized views support the excluded_refresh_tables property. #56428

Bug Fixes

Fixed the following issues:

Flat JSON does not support the get_json_bool function. #58077
SHOW AUTHENTICATION statement returns the password. #58072
The percentile_count function returns incorrect values. #58038
Issues caused by spilling strategies. #58022
After a BE is blacklisted, Stream Load still dispatches tasks to the BE, causing task failures. #57919
Issues when using the cast function with semi-structured data types. #57804
The array_map function returns incorrect values. #57756
In the scenario of a single tablet, using multiple distinct functions on the same column with a single-column GROUP BY clause leads to incorrect query results. #57690
MIN/MAX values in the profiles of big queries are inaccurate. #57655
Non-partitioned materialized views based on Delta Lake data cannot rewrite queries. #57686
A Routine Load deadlock issue. #57430
Predicate pushdown issues with DATE/DATETIME columns. #57576
An issue when the percentile_disc function has an empty input. #57572
When modifying the bucket distribution of a table with the statement ALTER TABLE {table} PARTITIONS (p1, p1) DISTRIBUTED BY ..., specifying duplicate partition names could result in failure to delete internally generated temporary partitions. #57005
ALTER TABLE MODIFY COLUMN fails with expression partitioned tables based on str2date function. #57487
CACHE SELECT issue with semi-structured columns. #57448
Upgrade compatibility issue caused by hadoop-lib. #57436
Case sensitivity error issues when creating partitions. #54867
Some columns generate incorrect sort keys during updates. #57375
Unknown issues caused by nested window functions . #57216

Assets 2