Releases: StarRocks/starrocks
2.5.9
Release date: July 19, 2023
New features
- Queries that contain a different type of join than the materialized view can be rewritten. #25099
Improvements
- StarRocks external tables whose destination cluster is the current StarRocks cluster cannot be created. #25441
- If the queried fields are not included in the output columns of a materialized view but are included in the predicate of the materialized view, the query can still be rewritten. #23028
- Added a new field table_id to the table tables_config in the database Information_schema. You can join tables_config with be_tablets on the column table_id to query the names of the database and table to which a tablet belongs. #24061
Bug Fixes
Fixed the following issues:
- Count Distinct result is incorrect for Duplicate Key tables. #24222
- BEs may crash if the Join key is a large BINARY column. #25084
- The INSERT operation hangs if the length of CHAR data in a STRUCT to be inserted exceeds the maximum CHAR length defined in the STRUCT column. #25942
- The result of coalesce() is incorrect. #26250
- The version number for a tablet is inconsistent between the BE and FE after data is restored. #26518
- Partitions cannot be automatically created for recovered tables. #26813
3.1.0-rc01
3.1.0-RC01
Release date: July 7, 2023
New Features
Shared-data cluster
- Added support for Primary Key tables, on which persistent indexes cannot be enabled.
- Supports the AUTO_INCREMENT column attribute, which enables a globally unique ID for each data row and thus simplifies data management.
- Supports automatically creating partitions during loading and using partitioning expressions to define partitioning rules, thereby making partition creation easier to use and more flexible.
- [Preview] Supports storing data on Azure Blob Storage.
Data Lake analytics
- Supports accessing Parquet-formatted Iceberg v2 tables.
- [Preview] Supports sinking data to Iceberg tables in Parquet format.
- Supports accessing data stored in Elasticsearch by using Elasticsearch catalogs. This simplifies the creation of Elasticsearch external tables.
Storage engine, data ingestion, and query
- Supports random bucketing, which relieves the need to configure bucketing columns at table creation. In big data and high performance-demanding scenarios, we recommend that you continue using hash bucketing.
- Supports using the FILES keyword (actually a table value function) in INSERT INTO to directly load the data of Parquet- or ORC-formatted data files stored in AWS S3.
- Supports generated columns. With the generated column feature, StarRocks can automatically generate and store the values of column expressions and automatically rewrite queries to improve query performance.
- Supports loading data into columns of the MAP and STRUCT data types, and supports nesting Fast Decimal values in ARRAY, MAP, and STRUCT.
SQL reference
- Struct functions: struct (row), named_struct
- Map functions: str_to_map, map_concat, map_from_arrays, element_at, distinct_map_keys, cardinality
- Higher-order Map functions: map_filter, map_apply, transform_keys, transform_values
- Array functions: array_agg supports ORDER BY, array_generate, element_at, cardinality
- Higher-order Array functions: all_match, any_match
- Aggregate functions: min_by, percentile_disc
- Table functions: generate_series
Improvements
Shared-data cluster
- Optimized the data cache in StarRocks shared-data clusters. The optimized data cache allows for specifying the range of hot data. It can also prevent queries against cold data from occupying the local disk cache, thereby ensuring the performance of queries against hot data.
Materialized view
-
Optimized the creation of an asynchronous materialized view:
- Supports random bucketing. If users do not specify bucketing columns, StarRocks adopts random bucketing by default.
- Supports using ORDER BY to specify a sort key.
- Supports specifying attributes such as colocate_group, storage_medium, and storage_cooldown_time.
- Supports using session variables. Users can configure these variables by using the properties("session.<variable_name>" = "") syntax to flexibly adjust view refreshing strategies.
- Supports creating materialized views based on views. This makes materialized views easier to use in data modeling scenarios, because users can flexibly use views and materialized views based on their varying needs to implement layered modeling.
-
Optimized query rewrite with asynchronous materialized views:
- Supports Stale Rewrite, which allows materialized views that are not refreshed within a specified time interval to be used for query rewrite regardless of whether the base tables of the materialized views are updated. Users can specify the time interval by using the mv_rewrite_staleness_second property at materialized view creation.
- Supports rewriting View Delta Join queries against materialized views that are created on Hive catalog tables (a primary key and a foreign key must be defined).
- Optimized the mechanism for rewriting queries that contain union operations, and supports rewriting queries that contain joins or functions such as COUNT DISTINCT and time_slice.
-
Optimized the refreshing of asynchronous materialized views:
- Optimized the mechanism for refreshing materialized views that are created on Hive catalog tables. StarRocks now can perceive partition-level data changes, and refreshes only the partitions with data changes during each automatic refresh.
- Supports using the REFRESH MATERIALIZED VIEW WITH SYNC MODE syntax to synchronously invoke materialized view refresh tasks.
-
Enhanced the use of asynchronous materialized views:
- Supports using ALTER MATERIALIZED VIEW {ACTIVE | INACTIVE} to enable or disable a materialized view. Materialized views that are disabled (in the INACTIVE state) cannot be refreshed or used for query rewrite, but can be directly queried.
- Supports using ALTER MATERIALIZED VIEW SWAP WITH to swap two materialized views. Users can create a new materialized view and then perform an atomic swap with an existing materialized view to implement schema changes on the existing materialized view.
-
Optimized synchronous materialized views:
- Supports direct queries against synchronous materialized views using SQL hints [SYNC_MV], allowing for walking around issues that some queries cannot be properly rewritten in rare circumstances.
- Supports more expressions, such as CASE-WHEN, CAST, and mathematical operations, which make materialized views suitable for more business scenarios.
Data Lake analytics
- Optimized metadata caching and access for Iceberg to improve Iceberg data query performance.
- Optimized the data cache to further improve data lake analytics performance.
Storage engine, data ingestion, and query - Supports partial updates in column mode. Users can enable the column mode when they perform partial updates on Primary Key tables by using the UPDATE statement. The column mode is suitable for updating a small number of columns but a large number of rows, and can improve the updating performance by up to 10 times.
- Optimized the collection of statistics for the CBO. This reduces the impact of statistics collection on data ingestion and increases statistics collection performance.
- Optimized the merge algorithm to increase the overall performance by up to 2 times in permutation scenarios.
- Optimized the query logic to reduce dependency on database locks.
SQL reference
- Conditional functions case, coalesce, if, ifnull, and nullif support the ARRAY, MAP, STRUCT, and JSON data types.
- The following Array functions support nested types MAP, STRUCT, and ARRAY:
- array_agg
- array_contains, array_contains_all, array_contains_any
- array_slice, array_concat
- array_length, array_append, array_remove, array_position
- reverse, array_distinct, array_intersect, arrays_overlap
- array_sortby
- The following Array functions support the Fast Decimal data type:
- array_agg
- array_append, array_remove, array_position, array_contains
- array_length
- array_max, array_min, array_sum, array_avg
- arrays_overlap, array_difference
- array_slice, array_distinct, array_sort, reverse, array_intersect, array_concat
- array_sortby, array_contains_all, array_contains_any
Bug Fixes
Fixed the following issues:
- Requests to reconnect to Kafka for...
2.5.8
Release date: June 30, 2023
Improvements
- Optimized the error message reported when partitions are added to a non-partitioned table. #25266
- Optimized the auto tablet distribution policy for tables. #24543
- Optimized the default comments in the CREATE TABLE statement. #24803
- You can initiate synchronous manual refresh tasks for asynchronous materialized views using REFRESH MATERIALIZED VIEW WITH SYNC MODE. #25910
Bug Fixes
Fixed the following issues:
- The COUNT result of an asynchronous materialized view may be inaccurate if the materialized view is built on Union results. #24460
- "Unknown error" is reported when users attempt to forcibly reset the root password. #25492
- Inaccurate error message is displayed when INSERT OVERWRITE is executed on a cluster with less than three alive BEs. #25314
3.0.3
Release date: June 28, 2023
Improvements
- Metadata synchronization of StarRocks external tables has been changed to occur during data loading. #24739
- Users can specify partitions when they run INSERT OVERWRITE on tables whose partitions are automatically created. For more information, see Automatic partitioning. #25005
- Optimized the error message reported when partitions are added to a non-partitioned table. #25266
Bug Fixes
Fixed the following issues:
- The min/max filter gets the wrong Parquet field when the Parquet file contains complex data types. #23976
- Load tasks are still queuing even when the related database or table has been dropped. #24801
There is a low probability that an FE restart may cause BEs to crash. #25037 - Load and query jobs occasionally freeze when the variable enable_profile is set to true. #25060
- Inaccurate error message is displayed when INSERT OVERWRITE is executed on a cluster with less than three alive BEs. #25314
2.5.7
Release date: June 14, 2023
New features
- Inactive materialized views can be manually activated using ALTER MATERIALIZED VIEW <mv_name> ACTIVE. You can use this SQL command to activate materialized views whose base tables were dropped and then recreated. For more information, see ALTER MATERIALIZED VIEW. #24001
- StarRocks can automatically set an appropriate number of tablets when you create a table or add a partition, eliminating the need for manual operations. For more information, see Determine the number of tablets. #10614
Improvements
- Optimized the I/O concurrency of Scan nodes used in external table queries, which reduces memory usage and improves the stability of data loading from external tables. #23617 #23624 #23626
- Optimized the error message for Broker Load jobs. The error message contains retry information and the name of erroneous files. #18038 #21982
Optimized the error message returned when CREATE TABLE times out and added parameter tuning tips. #24510 - Optimized the error message returned when ALTER TABLE fails because the table status is not Normal. #24381
Ignores full-width spaces in the CREATE TABLE statement. #23885 - Optimized the Broker access timeout to increase the success rate of Broker Load jobs. #22699
- For Primary Key tables, the VersionCount field returned by SHOW TABLET contains Rowsets that are in the Pending state. #23847
- Optimized the Persistent Index policy. #22140
Bug Fixes
Fixed the following issues:
- When users load Parquet data into StarRocks, DATETIME values overflow during type conversion, causing data errors. #22356
- Bucket information is lost after Dynamic Partitioning is disabled. #22595
- Using unsupported properties in the CREATE TABLE statement causes null pointer exceptions (NPEs). #23859
- Table permission filtering in information_schema becomes ineffective. As a result, users can view tables they do not have permission to. #23804
- Information returned by SHOW TABLE STATUS is incomplete. #24279
- Schema change for Primary Key tables is hung if data loading occurs simultaneously with schema change. #23456
- RocksDB WAL flush blocks the brpc worker from processing bthreads, which interrupts high-frequency data loading into Primary Key tables. #22489
- TIME-type columns that are not supported in StarRocks can be successfully created. #23474
- Materialized view Union rewrite fails. #22922
3.0.2
Release date: June 13, 2023
Improvements
- Predicates in a UNION query can be pushed down after the query is rewritten by an asynchronous materialized view. #23312
- Optimized the auto tablet distribution policy for tables. #24543
- Removed the dependency of NetworkTime on system clocks, which fixes incorrect NetworkTime caused by inconsistent system clocks across servers. #24858
Bug Fixes
Fixed the following issues:
- Schema change for Primary Key tables is hung if data loading occurs simultaneously with schema change. #23456
- Queries encounter an error when the session variable pipeline_profile_level is set to 0. #23873
- CREATE TABLE encounters an error when cloud_native_storage_type is set to S3.
- LDAP authentication succeeds even when no password is used. #24862
- CANCEL LOAD fails if the table involved in the load job does not exist. #24922
3.0.1
New Features
- [Preview] Supports spilling intermediate computation results of large operators to disks to reduce the memory consumption of large operators. For more information, see Spill to disk.
- Routine Load supports loading Avro-formatted data.
- Supports Microsoft Azure Storage (including Azure Blob Storage and Azure Data Lake Storage).
Improvements
- Shared-data clusters support StarRocks external tables.
- Added load_tracking_logs to Information Schema to record recent loading errors.
- Ignores special characters in CREATE TABLE statements. #23885
Bug Fixes
Fixed the following issues:
- Information returned by SHOW CREATE TABLE is incorrect for Primary Key tables. #24237
- BEs may crash during a Routine Load job. #20677
- Null pointer exception (NPE) occurs if you specify unsupported properties when creating a partitioned table. #21374
- Information returned by SHOW TABLE STATUS is incomplete. #24279
2.5.6
Release date: May 19, 2023
Improvements
- Optimized the error message reported when INSERT INTO ... SELECT expires due to a small thrift_server_max_worker_thread value. #21964
- Tables created using CTAS have three replicas by default, which is consistent with the default replica number for common tables. #22854
Bug Fixes
- Truncating partitions fails because the TRUNCATE operation is case-sensitive to partition names. #21809
- Decommissioning BE fails due to the failure in creating temporary partitions for materialized views. #22745
- Dynamic FE parameters that require an ARRAY value cannot be set to an empty array. #22225
- Materialized views with the partition_refresh_number property specified may fail to completely refresh. #21619
- SHOW CREATE TABLE masks cloud credential information, which causes incorrect credential information in memory. #21311
- Predicates cannot take effect on some ORC files that are queried via external tables. #21901
- The min-max filter cannot properly handle lower- and upper-case letters in column names. #22626
- Late materialization causes errors in querying complex data types (STRUCT or MAP). #22862
- The issue that occurs when restoring a Primary Key table. #23384
3.0.0
Release date: April 28, 2023
New Features
System architecture
- Decouple storage and compute. StarRocks now supports data persistence into S3-compatible object storage, enhancing resource isolation, reducing storage costs, and making compute resources more scalable. Local disks are used as hot data cache for boosting query performance. The query performance of the new shared-data architecture is comparable to the classic architecture (shared-nothing) when local cache is hit. For more information, see Deploy and use shared-data StarRocks.
Storage engine and data ingestion
- The AUTO_INCREMENT attribute is supported to provide globally unique IDs, which simplifies data management.
- Automatic partitioning and partitioning expressions are supported, which makes partition creation easier to use and more flexible.
- Primary Key tables support more complete UPDATE and DELETE syntax, including the use of CTEs and references to multiple tables.
- Added Load Profile for Broker Load and INSERT INTO jobs. You can view the details of a load job by querying the load profile. The usage is the same as Analyze query profile.
Data Lake Analytics
- [Preview] Supports Presto/Trino compatible dialect. Presto/Trino's SQL can be automatically rewritten into StarRocks' SQL pattern. For more information, see the system variable sql_dialect.
- [Preview] Supports JDBC catalogs.
- Supports using SET CATALOG to manually switch between catalogs in the current session.
Privileges and security
- Provides a new privilege system with full RBAC functionalities, supporting role inheritance and default roles. For more information, see Overview of privileges.
- Provides more privilege management objects and more fine-grained privileges. For more information, see Privileges supported by StarRocks.
Query engine
- Allows more queries on joined tables to benefit from the query cache. For example, the query cache now supports Broadcast Join and Bucket Shuffle Join.
- Supports Global UDFs.
- Dynamic adaptive parallelism: StarRocks can automatically adjust the pipeline_dop parameter for query concurrency.
SQL reference
- Added the following privilege-related SQL statements: SET DEFAULT ROLE, SET ROLE, SHOW ROLES, and SHOW USERS.
- Added the following semi-structured data analysis functions: map_apply, map_from_arrays, map_filter, transform_keys, and transform_values.
array_agg supports ORDER BY. - Window functions lead and lag support IGNORE NULLS.
- Added string functions replace, hex_decode_binary, and hex_decode_string().
- Added encryption functions base64_decode_binary and base64_decode_string.
- Added math functions sinh, cosh, and tanh.
- Added utility function current_role.
Improvements
Deployment
- Updated Docker image and the related Docker deployment document for version 3.0. #20623 #21021
Storage engine and data ingestion
- Supports more CSV parameters for data ingestion, including SKIP_HEADER, TRIM_SPACE, ENCLOSE, and ESCAPE. See STREAM LOAD, BROKER LOAD, and ROUTINE LOAD.
- The primary key and sort key are decoupled in Primary Key tables. The sort key can be separately specified in ORDER BY when you create a table.
- Optimized the memory usage of data ingestion into Primary Key tables in scenarios such as large-volume ingestion, partial updates, and persistent primary indexes.
- Supports creating asynchronous INSERT tasks. For more information, see INSERT and SUBMIT TASK. #20609
Materialized view
- Optimized the rewriting capabilities of materialized views, including:
- Supports rewrite of View Delta Join, Outer Join, and Cross Join.
- Optimized SQL rewrite of Union with partition.
- Improved materialized view building capabilities: supporting CTE, select *, and Union.
- Optimized the information returned by SHOW MATERIALIZED VIEWS.
- Supports adding MV partitions in batches, which improves the efficiency of partition addition during materialized view building. #21167
Query engine
- All operators are supported in the pipeline engine. Non-pipeline code will be removed in later versions.
- Improved Big Query Positioning and added big query log. SHOW PROCESSLIST supports viewing CPU and memory information.
- Optimized Outer Join Reorder.
- Optimized error messages in the SQL parsing stage, providing more accurate error positioning and clearer error messages.
Data Lake Analytics
- Optimized metadata statistics collection.
- Supports using SHOW CREATE TABLE to view the creation statements of the tables that are managed by an external catalog and are stored in Apache Hive™, Apache Iceberg, Apache Hudi, or Delta Lake.
Bug Fixes
- Some URLs in the license header of StarRocks' source file cannot be accessed. #2224
- An unknown error is returned during SELECT queries. #19731
- Supports SHOW/SET CHARACTER. #17480
- When the loaded data exceeds the field length supported by StarRocks, the error message returned is not correct. #14
- Supports show full fields from 'table'. #17233
- Partition pruning causes MV rewrites to fail. #14641
- MV rewrite fails when the CREATE MATERIALIZED VIEW statement contains count(distinct) and count(distinct) is applied to the DISTRIBUTED BY column. [#16558]...
2.5.5
New features
Added a metric to monitor the tablet status of Primary Key tables:
- Added the FE metric err_state_metric.
- Added the ErrorStateTabletNum column to the output of SHOW PROC '/statistic/' to display the number of err_state tablets.
- Added the ErrorStateTablets column to the output of SHOW PROC '/statistic/<db_id>/' to display the IDs of err_state tablets.
For more information, see SHOW PROC.
Improvements
- Optimized the disk balancing speed when multiple BEs are added. # 19418
- Optimized the inference of storage_medium. When BEs use both SSD and HDD as storage devices, if the property storage_cooldown_time is specified, StarRocks sets storage_medium to SSD. Otherwise, StarRocks sets storage_medium to HDD. #18649
- Optimized the performance of Unique Key tables by forbidding the collection of statistics from value columns. #19563
Bug Fixes
- For Colocation tables, the replica status can be manually specified as bad by using statements like ADMIN SET REPLICA STATUS PROPERTIES ("tablet_id" = "10003", "backend_id" = "10001", "status" = "bad");. If the number of BEs is less than or equal to the number of replicas, the corrupted replica cannot be repaired. # 17876
- After a BE is started, its process exists but the BE port cannot be enabled. # 19347
- Wrong results are returned for aggregate queries whose subquery is nested with a window function. # 19725
- auto_refresh_partitions_limit does not take effect when the materialized view (MV) is refreshed for the first time. As a result, all the partitions are refreshed. # 19759
- An error occurs when querying a CSV Hive external table whose array data is nested with complex data such as MAP and STRUCT. # 20233
- Queries that use Spark connector time out. # 20264
- If one replica of a two-replica table is corrupted, the table cannot recover. # 20681
- Query failure caused by MV query rewrite failure. # 19549
- The metric interface expires due to database lock. # 20790
- Wrong results are returned for Broadcast Join. # 20952
- NPE is returned when an unsupported data type is used in CREATE TABLE. # 20999
- The issue caused by using window_funnel() with the Query Cache feature. # 21474
- Optimization plan selection takes an unexpectedly long time after the CTE is rewritten. # 16515