Releases: StarRocks/starrocks
Releases · StarRocks/starrocks
3.3.14
3.3.14
Release Date: May 14, 2025
Improvements
- Optimized error messages for regex parsing failures. #57904
- Fixed security vulnerabilities SNYK-JAVA-ORGJSON-5488379 and SNYK-JAVA-ORGJSON-5962464. #58425
Bug Fixes
Fixed the following issues:
- Issues with the JSON data type in
first_value
/last_value
/lead
/lag
window functions. #58697 - Deadlock caused by table-level locks from base tables during materialized view writes (after the bug fix, DB-level locks are used). #58615
- INSERT tasks hang when the target table is deleted. #58603
- Failure to change active/inactive state of materialized views with List partitions. #58575
- Incorrect
streaming_load_current_processing
metric. #58565 - Data version update errors caused by continuous loading and replica clone tasks. #58513
- Failed to refresh materialized views on external tables. #58506
- Incorrect
if()
results on ARM architecture. #58455 - Materialized view rewriting generated incorrect query plans. #58487
- Iceberg table metadata did not refresh automatically. #58490
- Incorrect query plan generated by
group_concat
. #57908 - Mass Tablet load failures caused by unhandled exceptions during loading. #58393
- Constant folding failed due to type mismatches while pruning List partitions with generated columns (after the bug fix, an implicit cast rule was added). #54543
- Mismatch between aggregate function return type and original column type (after the bug fix, the column type is
cast
to the function output type). #58407 broadcast_row_limit
set to 0 or below failed to prevent BROADCAST JOIN generation. #58307- Broker Load used BE nodes that had already been blacklisted. #58350
- Asynchronous tasks persist in the background and cannot be dropped after manually cancelling materialized view refresh tasks. #58310
- Failed to create expression partitions with month or year granularity. #58182
ngram_search
generated invalid query plans. #58190
3.3.13
3.3.13
Release Date: April 22, 2025
Improvements
- Added memory consumption metrics for queries in FE in audit logs and the QueryDetail interface. #57731
- Optimized the strategy for concurrent creation of expression partitions. #57899
- Added monitoring metrics for the number of active FE nodes. #57857
- The
information_schema.task_runs
view supports pushdown of the LIMIT clause. #57404 - Fixed several CVE issues. #57705 #57620
- Primary Key tables support retry during the PUBLISH stage, enhancing system disaster recovery capabilities. #57354
- Reduced memory consumption of Flat JSON. #57357
- The
information_schema.routine_load_jobs
view adds thetimestamp_progress
column, consistent with the SHOW ROUTINE LOAD statement return. #57123 - Disallowed unauthorized behaviors from StarRocks to LDAP. #57131
- Supports returning an error when the schema of an AVRO file does not match the schema of the Hive table. #57296
- Materialized views support the
excluded_refresh_tables
property. #56428
Bug Fixes
Fixed the following issues:
- Flat JSON does not support the
get_json_bool
function. #58077 - SHOW AUTHENTICATION statement returns the password. #58072
- The
percentile_count
function returns incorrect values. #58038 - Issues caused by spilling strategies. #58022
- After a BE is blacklisted, Stream Load still dispatches tasks to the BE, causing task failures. #57919
- Issues when using the
cast
function with semi-structured data types. #57804 - The
array_map
function returns incorrect values. #57756 - In the scenario of a single tablet, using multiple
distinct
functions on the same column with a single-column GROUP BY clause leads to incorrect query results. #57690 - MIN/MAX values in the profiles of big queries are inaccurate. #57655
- Non-partitioned materialized views based on Delta Lake data cannot rewrite queries. #57686
- A Routine Load deadlock issue. #57430
- Predicate pushdown issues with DATE/DATETIME columns. #57576
- An issue when the
percentile_disc
function has an empty input. #57572 - When modifying the bucket distribution of a table with the statement
ALTER TABLE {table} PARTITIONS (p1, p1) DISTRIBUTED BY ...
, specifying duplicate partition names could result in failure to delete internally generated temporary partitions. #57005 - ALTER TABLE MODIFY COLUMN fails with expression partitioned tables based on
str2date
function. #57487 - CACHE SELECT issue with semi-structured columns. #57448
- Upgrade compatibility issue caused by
hadoop-lib
. #57436 - Case sensitivity error issues when creating partitions. #54867
- Some columns generate incorrect sort keys during updates. #57375
- Unknown issues caused by nested window functions . #57216
3.4.2
Release Date: April 10, 2025
Improvements
- FE supports graceful shutdown to improve system availability. When exiting FE via
./stop_fe.sh -g
, FE will first return a 500 status code to the front-end Load Balancer via the/api/health
API to indicate that it is preparing to shut down, allowing the Load Balancer to switch to other available FE nodes. Meanwhile, FE will continue to run ongoing queries until they finish or timeout (default timeout: 60 seconds). #56823
Bug Fixes
The following issues have been fixed:
- Partition pruning might not work if the partition column is a generated column. #54543
- Incorrect parameter handling in the
concat
function could cause a BE crash during query execution. #57522 - The
ssl_enable
property did not take effect when using Broker Load to load data. #57229 - When NULL values exist, querying subfields of STRUCT-type columns could cause a BE crash. #56496
- When modifying the bucket distribution of a table with the statement
ALTER TABLE {table} PARTITIONS (p1, p1) DISTRIBUTED BY ...
, specifying duplicate partition names could result in failure to delete internally generated temporary partitions. #57005 - In a shared-data cluster, running
SHOW PROC '/current_queries'
resulted in the error "Error 1064 (HY000): Sending collect query statistics request fails". #56597 - Running
INSERT OVERWRITE
loading tasks in parallel caused the error "ConcurrentModificationException: null", resulting in loading failure. #56557 - After upgrading from v2.5.21 to v3.1.17, running multiple Broker Load tasks concurrently could cause exceptions. #56512
Behavior Changes
- The default value of the BE configuration item
avro_ignore_union_type_tag
has been changed totrue
, enabling the direct parsing of["NULL", "STRING"]
as STRING type data, which better aligns with typical user requirements. #57553 - The default value of the session variable
big_query_profile_threshold
has been changed from 0 to 30 (seconds). #57177 - A new FE configuration item
enable_mv_refresh_collect_profile
has been added to control whether to collect Profile information during materialized view refresh. The default value isfalse
(previously, the system collected Profile by default). #56971
3.3.12
3.3.12
Release date: April 3, 2025
New Features
- Supports the
percentile_approx_weighted
function. #56654 - Supports modifying properties of Hive Catalog and Hudi Catalog. #56212
- Paimon Catalog supports manifest cache. #55788
- Supports
SHOW PARTITIONS
for tables in Paimon Catalog. #55785 - Supports statistics collection for Paimon Catalog. #55757
Improvements
- Various improvements and bug fixes related to statistics. #57147 #57238 #57170 #57154 #57124 #57047 #56956 #57031 #56904 #56950 #56671 #55922
- Optimized error messages when table creation fails. #57055
- Enhanced retry mechanism for Broker Load. #56987
- Improved performance of
array_generate
. #57252 - Aborted ongoing Compaction tasks for deleted partitions. #56943
- Optimized error messages when
ALTER TABLE
fails. #57054 - Removed unnecessary reverse step from
array_agg()
to improve performance. #56958 - Added checksum verification for replicas in Primary Key tables. #56519
- Masked sensitive information in the
FILES
function output. #56684 - Reduced noisy logs related to materialized views. #56672
- Upgraded Iceberg version to 1.7.1. #55271
Bug Fixes
INSERT INTO FILES
did not support CSV delimiter conversion. #57126- Issues with Iceberg REST Catalog. #55416
- Predicate was lost during rewrite for view-based materialized views. #57153
- Paimon Catalog failed to read tables with schema changes. #56796
- Timezone conversion issue in Paimon Catalog. #56879
SHOW MATERIALIZED VIEWS
did not displaydefault_catalog
information. #56362- In Trino dialect mode, time strings containing 'T' were not accepted. (Solution: replaced
parse_datetime
withstr_to_jodatime
.) #56565 - Incorrect result of
first_value
function. #56467 - Incorrect result of
concat_ws
function. #56384
Behavior Changes
3.4.1
Release Date: March 12, 2025
New Features and Enhancements
- Data lake analytics supports Deletion Vector in Delta Lake.
- Supports secure views. By creating a secure view, you can prevent users without the SELECT privilege on the referenced base tables from querying the view (even if they have the SELECT privilege on the view).
- Supports for Sketch HLL (
ds_hll_count_distinct
). Compared toapprox_count_distinct
, this function provides higher-precision approximate deduplication. - Storage Volume in the shared-data clusters supports Azure Data Lake Storage Gen2.
- Supports SSL authentication for connections to StarRocks via the MySQL protocol, ensuring that data transmitted between the client and the StarRocks cluster cannot be read by unauthorized users.
Bug Fixes
The following issues have been fixed:
- An issue where OLAP views affected the materialized view processing logic. #52989
- Write transactions would fail if one replica was not found, regardless of how many replicas had successfully committed. (After the fix, the transaction succeeds as long as the majority replicas succeed. #55212
- Stream Load fails when a node with an Alive status of false was scheduled. #55371
- Files in cluster snapshots were mistakenly deleted. #56338
Behavior Changes
- Graceful shutdown is now enabled by default (previously it was disabled). The default value of the related BE/CN parameter
loop_count_wait_fragments_finish
has been changed to2
, meaning that the system will wait up to 20 seconds for running queries to complete. #56002
3.3.11
3.3.11
Release date: March 7, 2025
Improvements
Files
supports exporting JSON type data into Parquet files. #56406- Optimized Data Cache WarmUp performance for cloud-native tables in shared-data clusters. #56190
- Supports parsing
AT TIME ZONE
expressions and thefrom_iso8601_timestamp
function in Trino. #56311 #55573 - Partial Updates for Primary Key tables within shared-data clusters supports Condition Updates. #56132
- Extended support for statistics collection across all types of SQL statements. #56257
- Supports configuring the maximum number of returned rows for
SHOW PROC '/transaction'
. #55933 - Supports creating asynchronous materialized views on Oracle-type JDBC Catalog tables. #55372
- MemTracker on BE WebUI supports pagination with 25 rows per page. #56206
Bug Fixes
Fixed the following issues:
- FE does not support casting constant TIME data types into DATETIME. #55804
- Stream Load transaction interface does not support the
starrocks_fe_table_load_rows
andstarrocks_fe_table_load_bytes
metrics. #44991 - Changes to automatic statistics collection do not take effect. #56173
- Materialized views in abnormal states caused issues with
SHOW MATERIALIZED VIEWS
. #55995 - Text-based materialized view rewrite does not work across different databases. #56001
- Metadata compatibility issues in JDBC Catalogs. #55993
- Issues of handling the JSON data type in JDBC Catalogs. #56008
- Incorrect Sort Key settings during Schema Change. #55902
- Credential information leak issue in Broker Load. #55358
Behavior Changes
- Added authentication to the
query_detail
interface in FE. #55919
3.2.15
Release date: February 14, 2025
New Features
- Window functions support max_by and min_by. #54961
Improvements - Added StarClient timeout parameters. #54496
- star_client_read_timeout_seconds
- star_client_list_timeout_seconds
- star_client_write_timeout_seconds
- Tables with List partitioning strategies support partition pruning for DELETE statements. #55400
Bug Fixes
Fixed the following issues:
3.4.0
Release date: January 24, 2025
Data Lake Analytics
- Optimized Iceberg V2 query performance and lowered memory usage by reducing repeated reads of delete-files.
- Supports column mapping for Delta Lake tables, allowing queries against data after Delta Schema Evolution. For more information, see Delta Lake catalog - Feature support.
- Data Cache related improvements:
- Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance. In simulated test cases with large queries, SLRU-based query performance can be improved by 70% or even higher. For more information, see Data Cache - Cache replacement policies.
- Unified the Data Cache instance used in both shared-data architecture and data lake query scenarios to simplify the configuration and improve resource utilization. For more information, see Data Cache.
- Provides an adaptive I/O strategy optimization for Data Cache, which flexibly routes some query requests to remote storage based on the cache disk's load and performance, thereby enhancing overall access throughput.
- Supports automatic collection of external table statistics through automatic ANALYZE tasks triggered by queries. It can provide more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance. For more information, see Query-triggered collection.
- Provides Time Travel query capability for Iceberg, allowing data to be read from a specified BRANCH or TAG by specifying TIMESTAMP or VERSION.
- Supports asynchronous delivery of query fragments for data lake queries. It avoids the restriction that FE must obtain all files to be queried before BE can execute a query, thus allowing FE to fetch query files and BE to execute queries in parallel, and reducing the overall latency of data lake queries involving a large number of files that are not in the cache. Meanwhile, it reduces the memory load on FE due to caching the file list and improves query stability. (Currently, the optimization for Hudi and Delta Lake is implemented, while the optimization for Iceberg is still under development.)
Performance Improvement and Query Optimization
- [Experimental] Offers a preliminary Query Feedback feature for automatic optimization of slow queries. The system will collect the execution details of slow queries, automatically analyze its query plan for potential opportunities for optimization, and generate a tailored optimization guide for the query. If CBO generates the same bad plan for subsequent identical queries, the system will locally optimize this query plan based on the guide. For more information, see Query Feedback.
- [Experimental] Supports Python UDFs, offering more convenient function customization compared to Java UDFs. For more information, see Python UDF.
- [Experimental] Supports Arrow Flight interface for more efficient reading of large data volumes in query results. It also allows BE, instead of FE, to process the returned results, greatly reducing the pressure on FE. It is especially suitable for business scenarios involving big data analysis and processing, and machine learning.
- Enables the pushdown of multi-column OR predicates, allowing queries with multi-column OR conditions (for example,
a = xxx OR b = yyy
) to utilize certain column indexes, thus reducing data read volume and improving query performance. - Optimized TPC-DS query performance by roughly 20% under the TPC-DS 1TB Iceberg dataset. Optimization methods include table pruning and aggregated column pruning using primary and foreign keys, and aggregation pushdown.
Shared-data Enhancements
- Supports Query Cache, aligning the shared-nothing architecture.
- Supports synchronous materialized views, aligning the shared-nothing architecture.
Storage Engine
- Unified all partitioning methods into the expression partitioning and supported multi-level partitioning, where each level can be any expression. For more information, see Expression Partitioning.
- [Preview] Supports all native aggregate functions in Aggregate tables. By introducing a generic aggregate function state storage framework, all native aggregate functions supported by StarRocks can be used to define an Aggregate table.
- Supports vector indexes, enabling fast approximate nearest neighbor searches (ANNS) of large-scale, high-dimensional vectors, which are commonly required in deep learning and machine learning scenarios. Currently, StarRocks supports two types of vector indexes: IVFPQ and HNSW.
Loading
- INSERT OVERWRITE now supports a new semantic - Dynamic Overwrite. When this semantic is enabled, the ingested data will either create new partitions or overwrite existing partitions that correspond to the new data records. Partitions not involved will not be truncated or deleted. This semantic is especially useful when users want to recover data in specific partitions without specifying the partition names. For more information, see Dynamic Overwrite.
- Optimized the data ingestion with INSERT from FILES to replace Broker Load as the preferred loading method:
- FILES now supports listing files in remote storage, and providing basic statistics of the files. For more information, see FILES - list_files_only.
- INSERT now supports matching columns by name, which is especially useful when users load data from numerous columns with identical names. (The default behavior matches columns by their position.) For more information, see Match column by name.
- INSERT supports specifying PROPERTIES, aligning with other loading methods. Users can specify
strict_mode
,max_filter_ratio
, andtimeout
for INSERT operations to control and behavior and quality of the data ingestion. For more information, see INSERT - PROPERTIES. - INSERT from FILES supports pushing down the target table schema check to the Scan stage of FILES to infer a more accurate source data schema. For more information, see see Push down target table schema check.
- FILES supports unionizing files with different schema. The schema of Parquet and ORC files are unionized based on the column names, and that of CSV files are unionized based on the position (order) of the columns. When there are mismatched columns, users can choose to fill the columns with NULL or return an error by specifying the property
fill_mismatch_column_with
. For more information, see Union files with different schema. - FILES supports inferring the STRUCT type data from Parquet files. (In earlier versions, STRUCT data is inferred as STRING type.) For more information, see Infer STRUCT type from Parquet.
- Supports merging multiple concurrent Stream Load requests into a single transaction and committing data in a batch, thus improving the throughput of real-time data ingestion. It is designed for high-concurrency, small-batch (from KB to tens of MB) real-time loading scenarios. It can reduce the excessive data versions caused by frequent loading operations, resource consumption during Compaction, and IOPS and I/O latency brought by excessive small files.
Others
- Optimized the graceful exit process of BE and CN by accurately displaying the status of BE or CN nodes during a graceful exit as
SHUTDOWN
. - Optimized log printing to avoid excessive disk space being occupied.
- Shared-nothing clusters now support backing up and restoring more objects: logical view, external catalog metadata, and partitions created with expression partitioning and list partitioning strategies.
- [Preview] Supports CheckPoint on Follower FE to avoid excessive memory on Leader FE during CheckPoint, thereby improving the stability of Leader FE.
Downgrade Notes
- Clusters can be downgraded from v3.4.0 only to v3.3.9 and later.
3.4.0-RC01
Release date: January 13, 2025
Data Lake Analytics
- Optimized Iceberg V2 query performance and lowered memory usage by reducing repeated reads of delete-files.
- Supports column mapping for Delta Lake tables, allowing queries against data after Delta Schema Evolution. For more information, see Delta Lake catalog - Feature support.
- Data Cache related improvements:
- Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance. In simulated test cases with large queries, SLRU-based query performance can be improved by 70% or even higher. For more information, see Data Cache - Cache replacement policies.
- Unified the Data Cache instance used in both shared-data architecture and data lake query scenarios to simplify the configuration and improve resource utilization. For more information, see Data Cache.
- Provides an adaptive I/O strategy optimization for Data Cache, which flexibly routes some query requests to remote storage based on the cache disk's load and performance, thereby enhancing overall access throughput.
- Supports automatic collection of external table statistics through automatic ANALYZE tasks triggered by queries. It can provide more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance. For more information, see Query-triggered collection.
Performance Improvement and Query Optimization
- [Experimental] Offers a preliminary Query Feedback feature for automatic optimization of slow queries. The system will collect the execution details of slow queries, automatically analyze its query plan for potential opportunities for optimization, and generate a tailored optimization guide for the query. If CBO generates the same bad plan for subsequent identical queries, the system will locally optimize this query plan based on the guide. For more information, see Query Feedback.
- [Experimental] Supports Python UDFs, offering more convenient function customization compared to Java UDFs. For more information, see Python UDF.
Shared-data Enhancements
- Supports Query Cache, aligning the shared-nothing architecture.
- Supports synchronous materialized views, aligning the shared-nothing architecture.
Storage Engine
- Unified all partitioning methods into the expression partitioning and supported multi-level partitioning, where each level can be any expression. For more information, see Expression Partitioning.
Loading
- INSERT OVERWRITE now supports a new semantic - Dynamic Overwrite. When this semantic is enabled, the ingested data will either create new partitions or overwrite existing partitions that correspond to the new data records. Partitions not involved will not be truncated or deleted. This semantic is especially useful when users want to recover data in specific partitions without specifying the partition names. For more information, see Dynamic Overwrite.
- Optimized the data ingestion with INSERT from FILES to replace Broker Load as the preferred loading method:
- FILES now supports listing files in remote storage, and providing basic statistics of the files. For more information, see FILES - list_files_only.
- INSERT now supports matching columns by name, which is especially useful when users load data from numerous columns with identical names. (The default behavior matches columns by their position.) For more information, see Match column by name.
- INSERT supports specifying PROPERTIES, aligning with other loading methods. Users can specify
strict_mode
,max_filter_ratio
, andtimeout
for INSERT operations to control and behavior and quality of the data ingestion. For more information, see INSERT - PROPERTIES. - INSERT from FILES supports pushing down the target table schema check to the Scan stage of FILES to infer a more accurate source data schema. For more information, see see Push down target table schema check.
- FILES supports unionizing files with different schema. The schema of Parquet and ORC files are unionized based on the column names, and that of CSV files are unionized based on the position (order) of the columns. When there are mismatched columns, users can choose to fill the columns with NULL or return an error by specifying the property
fill_mismatch_column_with
. For more information, see Union files with different schema. - FILES supports inferring the STRUCT type data from Parquet files. (In earlier versions, STRUCT data is inferred as STRING type.) For more information, see Infer STRUCT type from Parquet.
Others
- Optimized the graceful exit process of BE and CN by accurately displaying the status of BE or CN nodes during a graceful exit as
SHUTDOWN
. - Optimized log printing to avoid excessive disk space being occupied.
Downgrade Notes
- Clusters can be downgraded from v3.4.0 only to v3.3.9 and later.
3.3.9
3.3.9
Release date: January 12, 2025
New Features
- Supports the translation of Trino SQL into StarRocks SQL. #54185
Improvements
- Corrected FE node names starting with
bdbje_reset_election_group
to enhance clarity. #54399 - Implemented vectorization for the
IF
function on ARM architectures. #53093 ALTER SYSTEM CREATE IMAGE
supports creating an image for StarManager. #54370- Supports deleting cloud-native indexes of Primary Key tables in shared-data clusters. #53971
- Enforced the refresh of materialized views when the
FORCE
keyword is specified. #52081 - Supports specifying hints in
CACHE SELECT
. #54697 - Supports loading compressed CSV files using the
FILES()
function. Supported compression formats include gzip, bz2, lz4, deflate, and zstd. #54626 - Supports assigning multiple values to the same column in an
UPDATE
statement. #54534
Bug Fixes
Fixed the following issues:
- Unexpected errors when refreshing materialized views built on JDBC catalogs. #54487
- Instability in results when a Delta Lake table joins itself. #54473
- Upload retries fail when backing up data to HDFS. #53679
- BFD initialization errors on the aarch64 architecture. #54372
- Sensitive information recorded in BE logs. #54677
- Errors in Compaction-related metrics in profiles. #54678
- BE crashes caused by creating tables with nested
TIME
types. #54601 - Query plan errors for
LIMIT
queries with subquery TOP-N. #54507
Downgrade notes
- Clusters can be downgraded from v3.3.9 only to v3.2.11 and later.