Releases: StarRocks/starrocks
Releases · StarRocks/starrocks
3.5.2
Release Date: July 18, 2025
Improvements
- Collected NDV (number of distinct values) statistics for ARRAY columns to improve query plan accuracy. #60623
- Disabled replica balancing for Colocate tables and tablet scheduling in Shared-data clusters to reduce unnecessary log output. #60737
- Optimized Catalog access workflow: FE now delays accessing external data sources asynchronously at startup to prevent hanging due to external service unavailability. #60614
- Added session variable
enable_predicate_expr_reuse
to control predicate pushdown. #60603 - Supports a retry mechanism when fetching Kafka partition information fails. #60513
- Removed the restriction requiring exact mapping of partition columns between materialized views and base tables. #60565
- Supports building Runtime In-Filters to enhance aggregation performance by filtering data during aggregation. #59288
Bug Fixes
Fixed the following issues:
- COUNT DISTINCT queries crash due to low-cardinality optimization for multiple columns. #60664
- Incorrect matching of global UDFs when multiple functions share the same name. #60550
- Null pointer exception (NPE) issue during Stream Load import. #60755
- Null pointer exception (NPE) issue when starting FE during a recovery from a cluster snapshot. #60604
- BE crash caused by column mode mismatch when processing short-circuit queries with out-of-order values. #60466
- Session variables set via PROPERTIES in SUBMIT TASK statements did not take effect. #60584
- Incorrect results for
SELECT min/max
queries under specific conditions. #60601 - Incorrect bucket pruning when the left side of a predicate is a function, leading to incorrect query results. #60467
- Crash for queries against a non-existent
query_id
via Arrow Flight SQL. #60497
Behavior Changes
- The default value of
lake_compaction_allow_partial_success
is set totrue
. Compaction operations can now be marked as successful even if partially completed, preventing blockage of subsequent compaction tasks. #60643
3.4.5
Release Date: July 10, 2025
Improvements
- Enhanced observability of loading job execution: Unified the runtime information of loading tasks into the
information_schema.loads
view. Users can view the execution details of all INSERT, Broker Load, Stream Load, and Routine Load subtasks in this view. Additional fields have been added to help users better understand the status of loading tasks and the association with parent jobs (PIPES, Routine Load Jobs). - Support modifying
kafka_broker_list
via theALTER ROUTINE LOAD
statement.
Bug Fixes
The following issues have been fixed:
- Under high-frequency loading scenarios, Compaction could be delayed. #59998
- Querying Iceberg external tables via Unified Catalog would throw an error:
not support getting unified metadata table factory
. #59412 - When using
DESC FILES()
to view CSV files in remote storage, incorrect results were returned because the system mistakenly inferredxinf
as the FLOAT type. #59574 INSERT INTO
could cause BE to crash when encountering empty partitions. #59553- When StarRocks reads Equality Delete files in Iceberg, it could still access deleted data if the data had already been removed from the Iceberg table. #59709
- Query failures caused by renaming columns. #59178
Behavior Changes
- The default value of the BE configuration item
skip_pk_preload
has been changed fromfalse
totrue
. As a result, the system will skip preloading Primary Key Indexes for Primary Key tables to reduce the likelihood ofReached Timeout
errors. This change may increase query latency for operations that require loading Primary Key Indexes.
v3.5.1
Release Date: July 1, 2025
New Features
- [Experimental] Starting from v3.5.1, StarRocks introduces a high-performance data transfer channel based on the Apache Arrow Flight SQL protocol, comprehensively optimizing the data import channel and significantly improving transfer efficiency. This solution establishes a fully columnar data transfer pipeline from the StarRocks columnar execution engine to the client, eliminating the frequent row-column conversions and serialization overhead typically seen in traditional JDBC and ODBC interfaces, and achieving true zero-copy, low-latency, and high-throughput data transfer capabilities. #57956
- Java Scalar UDFs (user-defined functions) now support ARRAY and MAP types as input parameters. #55356
- Cross-node data cache sharing: Enables nodes to share cached external table data of data lakes across compute nodes via the network. If a local cache miss occurs, the system first attempts to fetch data from the caches of other nodes within the same cluster. Only if all caches miss will it re-fetch data from remote storage. This feature effectively reduces performance jitter caused by cache invalidation during elastic scaling and ensures stable query performance. A new FE configuration parameter
enable_trace_historical_node
controls this behavior (Default:false
). #57083 - Storage Volume adds native support for Google Cloud Storage (GCS): You can now use GCS as a backend storage volume and manage and access GCS resources through the native SDK. #58815
Improvements
- Optimized error messages when creating Hive external tables fails. #60076
- Optimized
count(1)
query performance using thefile_record_count
in Iceberg metadata. #60022 - Refined the Compaction scheduling logic to avoid delayed scheduling when all subtasks succeed. #59998
- Added
JAVA_OPTS="--add-opens=java.base/java.util=ALL-UNNAMED"
to BE and CN after upgrading to JDK 17. #59947 - Supports modifying the
kafka_broker_list
property via the ALTER ROUTINE LOAD command when Kafka Broker endpoints change. #59787 - Supports reducing build dependencies of the Docker base image through parameters. #59772
- Supports accessing Azure using Managed Identity authentication. #59657
- Improved error messages when querying external data via
Files()
function with duplicate path column names. #59597 - Optimized LIMIT pushdown logic. #59265
Bug Fixes
Fixed the following issues:
- Partition pruning issue when queries include Max and Min aggregations and empty partitions. #60162
- Incorrect query results when rewriting queries with materialized views due to missing NULL partitions. #60087
- Refresh errors on Iceberg external tables when using partition expressions based on
str2date
. #60089 - Incorrect partition range when creating temporary partitions using the START END syntax. #60014
- Incorrect display of Routine Load metrics on non-leader FE nodes. #59985
- BE/CN crashes when executing queries containing
COUNT(*)
window functions. #60003 - Stream Load failures when the target table name contains Chinese characters. #59722
- Overall loading failures to triple-replica tables when loading to a secondary replica fails. #59762
- Missing parameters in SHOW CREATE VIEW output. #59714
Behavior Changes
- Some FE metrics include the
is_leader
label. #59883
3.3.16
3.3.16
Release Date: July 4, 2025
Improvements
- Optimized error logs when creating Hive tables with duplicate names. #60076
- Added the FE parameter
slow_lock_print_stack
to prevent process stalls in large clusters when printing thread stacks. #59967 - Reduced unnecessary locks during tablet scheduling. #59744
Bug Fixes
Fixed the following issues:
- SplitOR fails to prune scan columns. #60223
- Incorrect query plan for null-aware left anti joins. #60119
- Incorrect query results when rewriting queries with materialized views due to missing NULL partitions. #60087
- Partition pruning errors when tables contain empty partitions. #60162
- Refresh errors on Iceberg external tables when using partition expressions based on
str2date
. #60089 - Unexpected behavior caused by materialized view schema changes. #60079
- Issues related to low-cardinality global dictionaries in UNION operators. #60075
- Incorrect partition ranges for temporary partitions created using the START END syntax. #60014
- Lock issues with SUBMIT TASK. #60026
- Partial updates fail on Primary Key tables under certain conditions. #60052
- Crashes caused by BE failing to create directories due to a lack of permissions to access storage paths. #60028
- Cache failures due to cache key duplication in concurrent scenarios. #60053
- Hive table metadata background refresh failure in Unified Catalog. #55215
- Query failures caused by incorrect return types of CASE WHEN. #59972
- Query failures when Delta Lake tables UNION themselves. #60030
- Partition creation failure when writing to multiple tables within the same transaction. #59954
- Queries could return empty results instead of errors when tablet versions were updated during execution. #53060
- Queries against modified columns in a table return null after upgrading to v3.4. #59941
- Authentication information is printed in logs. #59907
- Metadata refresh failures for external tables in Hive Catalog. #54596
- CACHE SELECT failures for tables after schema changes. #59812
- Broker Load could not recover after FE Leader shifts. #59732
- Stream Load failures when the target table name contains Chinese characters. #59722
- Incorrect query results in external tables due to search key hash collisions (affecting Iceberg/Delta/Paimon). #59781
3.3.15
3.3.15
Bug Fixes
Fixed the following issues:
- Missing double quotes for string parameters in statistics INSERT statements. #59713
- Downgrade failure caused by Rollup tasks. #59735
- Incorrect function parameters in the result of
SHOW CREATE VIEW
. #59714 - A security issue where SQL statements with syntax errors exposed sensitive information in the Audit Log. #59442
- Error "Query version not found". #59194
- Failure to change data distribution using the
ALTER TABLE
statement. #59360 - An issue where root user processes were still visible when admin protection was enabled. #59435
- Failure of
INSERT OVERWRITE
into Hive. #59469 - Missing Tablet ID in the
max_tablet_rowset_num
log item. #59467 - An error caused by misconfigured Persistent Index parameters on a Duplicate table. #56040
- TaskRun history being archived on FE Follower nodes. #59393
- External catalog-based materialized view refresh errors. #59369
- Missing minimum version in Tablet information on shared-data clusters. #59373
- Abnormal maximum column unique ID in native tables of shared-data clusters due to version compatibility logic errors. #59190
- Materialized view refresh failure on Iceberg catalogs when the source Iceberg table is dropped and recreated, and manual refresh also fails after the materialized view is set to active. #59287
- Contamination of parameters in materialized view refresh tasks. #59052
- Data loss caused by Persistent Index when loading snapshot fails. #59247
- Issues caused when subcolumns of STRUCT appear in multiple predicates. #59216
- Query failure after renaming columns. #59178
- Loading failure due to multiple Stream Load requests. #59181
- Inability to refresh Hive table-based materialized views at the partition level in Unified Catalog. #59139
- Incorrect UNION plan causing FE out-of-memory (OOM). #59030
- Version loss during data loading. #59006
- Predicate loss when queries are rewritten to synchronous materialized views. #58831
- Issues with BITMAP/HLL/PERCENTILE data types in window functions. #58776
- Metadata changes to the external tables in Hive Catalog cannot be refreshed. #54596
Behavior Changes
- Introduced FE configuration parameter
task_runs_max_history_number
to control the number of historical TaskRuns retained in theinformation_schema.task_runs
view, reducing memory usage. #59161
3.5.0
Release Date: June 13, 2025
Upgrade Notes
- JDK 17 or later is required from StarRocks v3.5.0 onwards.
- To upgrade a cluster from v3.4 or earlier, you must upgrade the version of JDK that StarRocks depends, and remove the options that are incompatible with JDK 17 in the configuration item
JAVA_OPTS
in the FE configuration file fe.conf, for example, options that involve CMS and GC. The default value ofJAVA_OPTS
in the v3.5 configuration file is recommended. - For clusters using external catalogs, you need to add
--add-opens=java.base/java.util=ALL-UNNAMED
to theJAVA_OPTS
configuration item in the BE configuration file be.conf. - In addition, as of v3.5.0, StarRocks no longer provides JVM configurations for specific JDK versions. All versions of JDK use
JAVA_OPTS
.
- To upgrade a cluster from v3.4 or earlier, you must upgrade the version of JDK that StarRocks depends, and remove the options that are incompatible with JDK 17 in the configuration item
Shared-data Enhancement
- Shared-data clusters support generated columns. #53526
- Cloud-native Primary Key tables in shared-data clusters support rebuilding specific indexes. The performance of the indexes is also optimized. #53971 #54178
- Optimized the execution logic of large-scale data loading operations to avoid generating too many small files in Rowset due to memory limitations. During the import, the system will merge the temporary data blocks to reduce the generation of small files, which improves the query performance after the import and also reduces the subsequent Compaction operations to improve the system resource utilization. #53954
Data Lake Analytics
- [Beta] Supports creating Iceberg views in the Iceberg Catalog with Hive Metastore integration. And supports adding or modifying the dialect of the Iceberg view using the ALTER VIEW statement for better syntax compatibility with external systems. #56120
- Supports nested namespace for Iceberg REST Catalog. #58016
- Supports using
IcebergAwsClientFactory
to create AWS clients in Iceberg REST Catalog to offer vended credentials. #58296 - Parquet Reader supports filtering data with Bloom Filter. #56445
- Supports automatically creating global dictionaries for low-cardinality columns in Parquet-formatted Hive/Iceberg tables during queries. #55167
Performance Improvement and Query Optimization
- Statistics optimization:
- Supports Table Sample. Improved statistics accuracy and query performance by sampling data blocks in physical files. #52787
- Supports recording the predicate columns in queries for targeted statistics collection. #53204
- Supports partition-level cardinality estimation. The system reuses the system-defined view
_statistics_.column_statistics
to record the NDV of each partition. #51513 - Supports multi-column Joint NDV collection to optimize the query plan generated by CBO in the scenario where columns correlate with each other. #56481 #56715 #56766 #56836
- Supports using histograms to estimate the Join node cardinality and in_predicate selectivity, thus improving the estimation accuracy in data skew. #57874 #57639
- Optimized Query Feedback. Queries with the identical structure but different parameter values will be categorized as the same type and share the same tuning guide for plan execution optimization. #58306
- Supports Runtime Bitset Filter as an alternative for optimization to Bloom Filter in specific scenarios. #57157
- Supports pushing down Join Runtime Filter to the storage layer. #55124
- Supports Pipeline Event Scheduler. #54259
Partition Management
- Supports using ALTER TABLE to merge expression partitions based on time functions for optimized storage efficiency and query performance. #56840
- Supports partition Time-to-live (TTL) for List-partitioned tables and materialized views. And supports the property
partition_retention_condition
in tables and materialized views to allow users to set data retention strategies for list partitions, thus achieving more flexible partition deletion strategies. #53117 - Supports using ALTER TABLE to delete partitions specified by common partition expressions, allowing users to flexibly delete partitions in batches. #53118
Cluster Management
- Upgraded FE compile target from Java 11 to Java 17 for better system stability and performance. #53617 #57030
Security and Authentication
- Supports secure connections encrypted by SSL based on the MySQL protocol. #54877
- Enhanced authentication using external systems:
- Supports creating StarRocks users with OAuth 2.0 and JSON Web Token (JWT).
- Supports Security Integration to simplify the authentication process with external systems. Security Integration supports LDAP, OAuth 2.0, and JWT. #55846
- Supports Group Provider to obtain the user group information from external authentication services. The group information can then be used in authentication and authorization. Group Provider supports acquiring group information from LDAP, operating systems, or files. Users can query the user group they belong to using the function
current_group()
. #56670
Materialized Views
- Supports creating materialized views with multiple partition columns to allow users to partition the data with a more flexible strategy. #52576
- Supports setting
query_rewrite_consistency
toforce_mv
to force the system to use the materialized view for query rewrite, thus keeping performance stability at the cost of data timeliness to a certain extent. #53819
Loading and Unloading
- Supports pausing Routine Load jobs on JSON parse errors by setting the property
pause_on_json_parse_error
totrue
. #56062 - [Beta] Supports transactions with multiple SQL statements (currently, only INSERT is supported). Users can start, apply, or undo a transaction to guarantee the ACID (atomicity, consistency, isolation, and durability) properties of multiple loading operations. #53978
Functions
- Introduced the system variable
lower_upper_support_utf8
on the session and global level, enhancing the support for UTF-8 strings (especially non-ASCII characters) in case conversion functions such asupper()
andlower()
. #56192 - Added new functions:
3.4.4
Release Date: June 10, 2025
Improvements
- Storage Volume now supports ADLS2 using Managed Identity as the credential. #58454
- For partitions based on complex time function expressions, partition pruning works well for partitions based on most DATETIME-related functions
- Supports loading Avro data files from Azure using the
FILES
function. #58131 - When Routine Load encounters invalid JSON data, the consumed partition and offset information is logged in the error log to facilitate troubleshooting. #55772
Bug Fixes
The following issues have been fixed:
- Concurrent queries accessing the same partition in a partitioned table caused Hive Metastore to hang. #58089
- Abnormal termination of
INSERT
tasks caused the job to remain in theQUEUEING
state. #58603 - After upgrading the cluster from v3.4.0 to v3.4.2, a large number of tablet replicas encounter exceptions. #58518
- FE OOM caused by incorrect
UNION
execution plans. #59040 - Invalid database IDs during partition recycling could cause FE startup to fail. #59666
- After a failed FE CheckPoint operation, the process could not exit properly, resulting in blocking. #58602
3.3.14
Release Date: May 14, 2025
Improvements
- Optimized error messages for regex parsing failures. #57904
- Fixed security vulnerabilities SNYK-JAVA-ORGJSON-5488379 and SNYK-JAVA-ORGJSON-5962464. #58425
Bug Fixes
Fixed the following issues:
- Issues with the JSON data type in
first_value
/last_value
/lead
/lag
window functions. #58697 - Deadlock caused by table-level locks from base tables during materialized view writes (after the bug fix, DB-level locks are used). #58615
- INSERT tasks hang when the target table is deleted. #58603
- Failure to change active/inactive state of materialized views with List partitions. #58575
- Incorrect
streaming_load_current_processing
metric. #58565 - Data version update errors caused by continuous loading and replica clone tasks. #58513
- Failed to refresh materialized views on external tables. #58506
- Incorrect
if()
results on ARM architecture. #58455 - Materialized view rewriting generated incorrect query plans. #58487
- Iceberg table metadata did not refresh automatically. #58490
- Incorrect query plan generated by
group_concat
. #57908 - Mass Tablet load failures caused by unhandled exceptions during loading. #58393
- Constant folding failed due to type mismatches while pruning List partitions with generated columns (after the bug fix, an implicit cast rule was added). #54543
- Mismatch between aggregate function return type and original column type (after the bug fix, the column type is
cast
to the function output type). #58407 broadcast_row_limit
set to 0 or below failed to prevent BROADCAST JOIN generation. #58307- Broker Load used BE nodes that had already been blacklisted. #58350
- Asynchronous tasks persist in the background and cannot be dropped after manually cancelling materialized view refresh tasks. #58310
- Failed to create expression partitions with month or year granularity. #58182
ngram_search
generated invalid query plans. #58190
3.4.3
Release Date: April 30, 2025
Improvements
- Routine Load and Stream Load support the use of Lambda expressions in the
columns
parameter for complex column data extraction.array_filter
/map_filter
can be used to filter and extract ARRAY/MAP data. Complex filtering and extraction of JSON data can be achieved by combining thecast
function to convert JSON array/JSON object to ARRAY and MAP types. For example,COLUMNS (js, col=array_filter(i -> json_query(i, '$.type')=='t1', cast(js as Array<JSON>))[1])
can extract the first JSON object from the JSON arrayjs
wheretype
ist1
. #58149 - Supports converting JSON objects to MAP type using the
cast
function, combined withmap_filter
to extract items from the JSON object that meet specific conditions. For example,map_filter((k, v) -> json_query(v, '$.type') == 't1', cast(js AS MAP<String, JSON>))
can extract the JSON object fromjs
wheretype
ist1
. #58045 - LIMIT is now supported when querying the
information_schema.task_runs
view. #57404
Bug Fixes
The following issues have been fixed:
- Queries against ORC format Hive tables are returned with an error
OrcChunkReader::lazy_seek_to failed. reason = bad read in RleDecoderV2: :readByte
. #57454 - RuntimeFilter from the upper layer could not be pushed down when querying Iceberg tables that contain Equality Delete files. #57651
- Enabling the spill-to-disk pre-aggregation strategy causes queries to fail. #58022
- Queries are returned with an error
ConstantRef-cmp-ConstantRef not supported here, null != 111 should be eliminated earlier
. #57735 - Query timeout with the
query_queue_pending_timeout_second
parameter while the Query Queue feature is not enabled. #57719
3.3.13
3.3.13
Release Date: April 22, 2025
Improvements
- Added memory consumption metrics for queries in FE in audit logs and the QueryDetail interface. #57731
- Optimized the strategy for concurrent creation of expression partitions. #57899
- Added monitoring metrics for the number of active FE nodes. #57857
- The
information_schema.task_runs
view supports pushdown of the LIMIT clause. #57404 - Fixed several CVE issues. #57705 #57620
- Primary Key tables support retry during the PUBLISH stage, enhancing system disaster recovery capabilities. #57354
- Reduced memory consumption of Flat JSON. #57357
- The
information_schema.routine_load_jobs
view adds thetimestamp_progress
column, consistent with the SHOW ROUTINE LOAD statement return. #57123 - Disallowed unauthorized behaviors from StarRocks to LDAP. #57131
- Supports returning an error when the schema of an AVRO file does not match the schema of the Hive table. #57296
- Materialized views support the
excluded_refresh_tables
property. #56428
Bug Fixes
Fixed the following issues:
- Flat JSON does not support the
get_json_bool
function. #58077 - SHOW AUTHENTICATION statement returns the password. #58072
- The
percentile_count
function returns incorrect values. #58038 - Issues caused by spilling strategies. #58022
- After a BE is blacklisted, Stream Load still dispatches tasks to the BE, causing task failures. #57919
- Issues when using the
cast
function with semi-structured data types. #57804 - The
array_map
function returns incorrect values. #57756 - In the scenario of a single tablet, using multiple
distinct
functions on the same column with a single-column GROUP BY clause leads to incorrect query results. #57690 - MIN/MAX values in the profiles of big queries are inaccurate. #57655
- Non-partitioned materialized views based on Delta Lake data cannot rewrite queries. #57686
- A Routine Load deadlock issue. #57430
- Predicate pushdown issues with DATE/DATETIME columns. #57576
- An issue when the
percentile_disc
function has an empty input. #57572 - When modifying the bucket distribution of a table with the statement
ALTER TABLE {table} PARTITIONS (p1, p1) DISTRIBUTED BY ...
, specifying duplicate partition names could result in failure to delete internally generated temporary partitions. #57005 - ALTER TABLE MODIFY COLUMN fails with expression partitioned tables based on
str2date
function. #57487 - CACHE SELECT issue with semi-structured columns. #57448
- Upgrade compatibility issue caused by
hadoop-lib
. #57436 - Case sensitivity error issues when creating partitions. #54867
- Some columns generate incorrect sort keys during updates. #57375
- Unknown issues caused by nested window functions . #57216