20 Jul 11:06

bearmmx

ba43477

2.5.9

Release date: July 19, 2023

New features

Queries that contain a different type of join than the materialized view can be rewritten. #25099

Improvements

StarRocks external tables whose destination cluster is the current StarRocks cluster cannot be created. #25441
If the queried fields are not included in the output columns of a materialized view but are included in the predicate of the materialized view, the query can still be rewritten. #23028
Added a new field table_id to the table tables_config in the database Information_schema. You can join tables_config with be_tablets on the column table_id to query the names of the database and table to which a tablet belongs. #24061

Bug Fixes

Fixed the following issues:

Count Distinct result is incorrect for Duplicate Key tables. #24222
BEs may crash if the Join key is a large BINARY column. #25084
The INSERT operation hangs if the length of CHAR data in a STRUCT to be inserted exceeds the maximum CHAR length defined in the STRUCT column. #25942
The result of coalesce() is incorrect. #26250
The version number for a tablet is inconsistent between the BE and FE after data is restored. #26518
Partitions cannot be automatically created for recovered tables. #26813

Assets 2

11 Jul 19:26

Dshadowzh

3.1.0-rc01

64ca37e

3.1.0-rc01 Pre-release

Pre-release

3.1.0-RC01
Release date: July 7, 2023

New Features

Shared-data cluster

Added support for Primary Key tables, on which persistent indexes cannot be enabled.
Supports the AUTO_INCREMENT column attribute, which enables a globally unique ID for each data row and thus simplifies data management.
Supports automatically creating partitions during loading and using partitioning expressions to define partitioning rules, thereby making partition creation easier to use and more flexible.
[Preview] Supports storing data on Azure Blob Storage.

Data Lake analytics

Supports accessing Parquet-formatted Iceberg v2 tables.
[Preview] Supports sinking data to Iceberg tables in Parquet format.
Supports accessing data stored in Elasticsearch by using Elasticsearch catalogs. This simplifies the creation of Elasticsearch external tables.

Storage engine, data ingestion, and query

Supports random bucketing, which relieves the need to configure bucketing columns at table creation. In big data and high performance-demanding scenarios, we recommend that you continue using hash bucketing.
Supports using the FILES keyword (actually a table value function) in INSERT INTO to directly load the data of Parquet- or ORC-formatted data files stored in AWS S3.
Supports generated columns. With the generated column feature, StarRocks can automatically generate and store the values of column expressions and automatically rewrite queries to improve query performance.
Supports loading data into columns of the MAP and STRUCT data types, and supports nesting Fast Decimal values in ARRAY, MAP, and STRUCT.

SQL reference

Struct functions: struct (row), named_struct
Map functions: str_to_map, map_concat, map_from_arrays, element_at, distinct_map_keys, cardinality
Higher-order Map functions: map_filter, map_apply, transform_keys, transform_values
Array functions: array_agg supports ORDER BY, array_generate, element_at, cardinality
Higher-order Array functions: all_match, any_match
Aggregate functions: min_by, percentile_disc
Table functions: generate_series

Improvements

Shared-data cluster

Optimized the data cache in StarRocks shared-data clusters. The optimized data cache allows for specifying the range of hot data. It can also prevent queries against cold data from occupying the local disk cache, thereby ensuring the performance of queries against hot data.

Materialized view

Optimized the creation of an asynchronous materialized view:
- Supports random bucketing. If users do not specify bucketing columns, StarRocks adopts random bucketing by default.
- Supports using ORDER BY to specify a sort key.
- Supports specifying attributes such as colocate_group, storage_medium, and storage_cooldown_time.
- Supports using session variables. Users can configure these variables by using the properties("session.<variable_name>" = "") syntax to flexibly adjust view refreshing strategies.
- Supports creating materialized views based on views. This makes materialized views easier to use in data modeling scenarios, because users can flexibly use views and materialized views based on their varying needs to implement layered modeling.
Optimized query rewrite with asynchronous materialized views:
- Supports Stale Rewrite, which allows materialized views that are not refreshed within a specified time interval to be used for query rewrite regardless of whether the base tables of the materialized views are updated. Users can specify the time interval by using the mv_rewrite_staleness_second property at materialized view creation.
- Supports rewriting View Delta Join queries against materialized views that are created on Hive catalog tables (a primary key and a foreign key must be defined).
- Optimized the mechanism for rewriting queries that contain union operations, and supports rewriting queries that contain joins or functions such as COUNT DISTINCT and time_slice.
Optimized the refreshing of asynchronous materialized views:
- Optimized the mechanism for refreshing materialized views that are created on Hive catalog tables. StarRocks now can perceive partition-level data changes, and refreshes only the partitions with data changes during each automatic refresh.
- Supports using the REFRESH MATERIALIZED VIEW WITH SYNC MODE syntax to synchronously invoke materialized view refresh tasks.
Enhanced the use of asynchronous materialized views:
- Supports using ALTER MATERIALIZED VIEW {ACTIVE | INACTIVE} to enable or disable a materialized view. Materialized views that are disabled (in the INACTIVE state) cannot be refreshed or used for query rewrite, but can be directly queried.
- Supports using ALTER MATERIALIZED VIEW SWAP WITH to swap two materialized views. Users can create a new materialized view and then perform an atomic swap with an existing materialized view to implement schema changes on the existing materialized view.
Optimized synchronous materialized views:
- Supports direct queries against synchronous materialized views using SQL hints [SYNC_MV], allowing for walking around issues that some queries cannot be properly rewritten in rare circumstances.
- Supports more expressions, such as CASE-WHEN, CAST, and mathematical operations, which make materialized views suitable for more business scenarios.

Data Lake analytics

Optimized metadata caching and access for Iceberg to improve Iceberg data query performance.
Optimized the data cache to further improve data lake analytics performance.
Storage engine, data ingestion, and query
Supports partial updates in column mode. Users can enable the column mode when they perform partial updates on Primary Key tables by using the UPDATE statement. The column mode is suitable for updating a small number of columns but a large number of rows, and can improve the updating performance by up to 10 times.
Optimized the collection of statistics for the CBO. This reduces the impact of statistics collection on data ingestion and increases statistics collection performance.
Optimized the merge algorithm to increase the overall performance by up to 2 times in permutation scenarios.
Optimized the query logic to reduce dependency on database locks.

SQL reference

Conditional functions case, coalesce, if, ifnull, and nullif support the ARRAY, MAP, STRUCT, and JSON data types.
The following Array functions support nested types MAP, STRUCT, and ARRAY:
- array_agg
- array_contains, array_contains_all, array_contains_any
- array_slice, array_concat
- array_length, array_append, array_remove, array_position
- reverse, array_distinct, array_intersect, arrays_overlap
- array_sortby
The following Array functions support the Fast Decimal data type:
- array_agg
- array_append, array_remove, array_position, array_contains
- array_length
- array_max, array_min, array_sum, array_avg
- arrays_overlap, array_difference
- array_slice, array_distinct, array_sort, reverse, array_intersect, array_concat
- array_sortby, array_contains_all, array_contains_any

Bug Fixes

Fixed the following issues:

Requests to reconnect to Kafka for...

Assets 2

04 Jul 07:38

bearmmx

2.5.8

0a371e0

2.5.8

Release date: June 30, 2023

Improvements

Optimized the error message reported when partitions are added to a non-partitioned table. #25266
Optimized the auto tablet distribution policy for tables. #24543
Optimized the default comments in the CREATE TABLE statement. #24803
You can initiate synchronous manual refresh tasks for asynchronous materialized views using REFRESH MATERIALIZED VIEW WITH SYNC MODE. #25910

Bug Fixes

Fixed the following issues:

The COUNT result of an asynchronous materialized view may be inaccurate if the materialized view is built on Union results. #24460
"Unknown error" is reported when users attempt to forcibly reset the root password. #25492
Inaccurate error message is displayed when INSERT OVERWRITE is executed on a cluster with less than three alive BEs. #25314

Assets 2

04 Jul 07:36

bearmmx

3.0.3

fe5e3a1

3.0.3

Release date: June 28, 2023

Improvements

Metadata synchronization of StarRocks external tables has been changed to occur during data loading. #24739
Users can specify partitions when they run INSERT OVERWRITE on tables whose partitions are automatically created. For more information, see Automatic partitioning. #25005
Optimized the error message reported when partitions are added to a non-partitioned table. #25266

Bug Fixes

Fixed the following issues:

The min/max filter gets the wrong Parquet field when the Parquet file contains complex data types. #23976
Load tasks are still queuing even when the related database or table has been dropped. #24801
There is a low probability that an FE restart may cause BEs to crash. #25037
Load and query jobs occasionally freeze when the variable enable_profile is set to true. #25060
Inaccurate error message is displayed when INSERT OVERWRITE is executed on a cluster with less than three alive BEs. #25314

Assets 2

14 Jun 11:49

bearmmx

2.5.7

8dc1b68

2.5.7

Release date: June 14, 2023

New features

Inactive materialized views can be manually activated using ALTER MATERIALIZED VIEW <mv_name> ACTIVE. You can use this SQL command to activate materialized views whose base tables were dropped and then recreated. For more information, see ALTER MATERIALIZED VIEW. #24001
StarRocks can automatically set an appropriate number of tablets when you create a table or add a partition, eliminating the need for manual operations. For more information, see Determine the number of tablets. #10614

Improvements

Optimized the I/O concurrency of Scan nodes used in external table queries, which reduces memory usage and improves the stability of data loading from external tables. #23617 #23624 #23626
Optimized the error message for Broker Load jobs. The error message contains retry information and the name of erroneous files. #18038 #21982
Optimized the error message returned when CREATE TABLE times out and added parameter tuning tips. #24510
Optimized the error message returned when ALTER TABLE fails because the table status is not Normal. #24381
Ignores full-width spaces in the CREATE TABLE statement. #23885
Optimized the Broker access timeout to increase the success rate of Broker Load jobs. #22699
For Primary Key tables, the VersionCount field returned by SHOW TABLET contains Rowsets that are in the Pending state. #23847
Optimized the Persistent Index policy. #22140

Bug Fixes

Fixed the following issues:

When users load Parquet data into StarRocks, DATETIME values overflow during type conversion, causing data errors. #22356
Bucket information is lost after Dynamic Partitioning is disabled. #22595
Using unsupported properties in the CREATE TABLE statement causes null pointer exceptions (NPEs). #23859
Table permission filtering in information_schema becomes ineffective. As a result, users can view tables they do not have permission to. #23804
Information returned by SHOW TABLE STATUS is incomplete. #24279
Schema change for Primary Key tables is hung if data loading occurs simultaneously with schema change. #23456
RocksDB WAL flush blocks the brpc worker from processing bthreads, which interrupts high-frequency data loading into Primary Key tables. #22489
TIME-type columns that are not supported in StarRocks can be successfully created. #23474
Materialized view Union rewrite fails. #22922

Assets 2

14 Jun 08:55

bearmmx

3.0.2

c833698

3.0.2

Release date: June 13, 2023

Improvements

Predicates in a UNION query can be pushed down after the query is rewritten by an asynchronous materialized view. #23312
Optimized the auto tablet distribution policy for tables. #24543
Removed the dependency of NetworkTime on system clocks, which fixes incorrect NetworkTime caused by inconsistent system clocks across servers. #24858

Bug Fixes

Fixed the following issues:

Schema change for Primary Key tables is hung if data loading occurs simultaneously with schema change. #23456
Queries encounter an error when the session variable pipeline_profile_level is set to 0. #23873
CREATE TABLE encounters an error when cloud_native_storage_type is set to S3.
LDAP authentication succeeds even when no password is used. #24862
CANCEL LOAD fails if the table involved in the load job does not exist. #24922

Assets 2

02 Jun 23:29

Dshadowzh

3.0.1

a1f411b

3.0.1

New Features

[Preview] Supports spilling intermediate computation results of large operators to disks to reduce the memory consumption of large operators. For more information, see Spill to disk.
Routine Load supports loading Avro-formatted data.
Supports Microsoft Azure Storage (including Azure Blob Storage and Azure Data Lake Storage).

Improvements

Shared-data clusters support StarRocks external tables.
Added load_tracking_logs to Information Schema to record recent loading errors.
Ignores special characters in CREATE TABLE statements. #23885

Bug Fixes

Fixed the following issues:

Information returned by SHOW CREATE TABLE is incorrect for Primary Key tables. #24237
BEs may crash during a Routine Load job. #20677
Null pointer exception (NPE) occurs if you specify unsupported properties when creating a partitioned table. #21374
Information returned by SHOW TABLE STATUS is incomplete. #24279

Assets 2

24 May 06:29

wangsimo0

2.5.6

a193ae0

2.5.6

Release date: May 19, 2023

Improvements

Optimized the error message reported when INSERT INTO ... SELECT expires due to a small thrift_server_max_worker_thread value. #21964
Tables created using CTAS have three replicas by default, which is consistent with the default replica number for common tables. #22854

Bug Fixes

Truncating partitions fails because the TRUNCATE operation is case-sensitive to partition names. #21809
Decommissioning BE fails due to the failure in creating temporary partitions for materialized views. #22745
Dynamic FE parameters that require an ARRAY value cannot be set to an empty array. #22225
Materialized views with the partition_refresh_number property specified may fail to completely refresh. #21619
SHOW CREATE TABLE masks cloud credential information, which causes incorrect credential information in memory. #21311
Predicates cannot take effect on some ORC files that are queried via external tables. #21901
The min-max filter cannot properly handle lower- and upper-case letters in column names. #22626
Late materialization causes errors in querying complex data types (STRUCT or MAP). #22862
The issue that occurs when restoring a Primary Key table. #23384

Assets 2

17 May 03:01

wangsimo0

3.0.0

48f4d81

3.0.0

Release date: April 28, 2023

New Features

System architecture

Decouple storage and compute. StarRocks now supports data persistence into S3-compatible object storage, enhancing resource isolation, reducing storage costs, and making compute resources more scalable. Local disks are used as hot data cache for boosting query performance. The query performance of the new shared-data architecture is comparable to the classic architecture (shared-nothing) when local cache is hit. For more information, see Deploy and use shared-data StarRocks.

Storage engine and data ingestion

The AUTO_INCREMENT attribute is supported to provide globally unique IDs, which simplifies data management.
Automatic partitioning and partitioning expressions are supported, which makes partition creation easier to use and more flexible.
Primary Key tables support more complete UPDATE and DELETE syntax, including the use of CTEs and references to multiple tables.
Added Load Profile for Broker Load and INSERT INTO jobs. You can view the details of a load job by querying the load profile. The usage is the same as Analyze query profile.

Data Lake Analytics

[Preview] Supports Presto/Trino compatible dialect. Presto/Trino's SQL can be automatically rewritten into StarRocks' SQL pattern. For more information, see the system variable sql_dialect.
[Preview] Supports JDBC catalogs.
Supports using SET CATALOG to manually switch between catalogs in the current session.

Privileges and security

Provides a new privilege system with full RBAC functionalities, supporting role inheritance and default roles. For more information, see Overview of privileges.
Provides more privilege management objects and more fine-grained privileges. For more information, see Privileges supported by StarRocks.

Query engine

Allows more queries on joined tables to benefit from the query cache. For example, the query cache now supports Broadcast Join and Bucket Shuffle Join.
Supports Global UDFs.
Dynamic adaptive parallelism: StarRocks can automatically adjust the pipeline_dop parameter for query concurrency.

SQL reference

Added the following privilege-related SQL statements: SET DEFAULT ROLE, SET ROLE, SHOW ROLES, and SHOW USERS.
Added the following semi-structured data analysis functions: map_apply, map_from_arrays, map_filter, transform_keys, and transform_values.
array_agg supports ORDER BY.
Window functions lead and lag support IGNORE NULLS.
Added string functions replace, hex_decode_binary, and hex_decode_string().
Added encryption functions base64_decode_binary and base64_decode_string.
Added math functions sinh, cosh, and tanh.
Added utility function current_role.

Improvements

Deployment

Updated Docker image and the related Docker deployment document for version 3.0. #20623 #21021

Storage engine and data ingestion

Supports more CSV parameters for data ingestion, including SKIP_HEADER, TRIM_SPACE, ENCLOSE, and ESCAPE. See STREAM LOAD, BROKER LOAD, and ROUTINE LOAD.
The primary key and sort key are decoupled in Primary Key tables. The sort key can be separately specified in ORDER BY when you create a table.
Optimized the memory usage of data ingestion into Primary Key tables in scenarios such as large-volume ingestion, partial updates, and persistent primary indexes.
Supports creating asynchronous INSERT tasks. For more information, see INSERT and SUBMIT TASK. #20609

Materialized view

Optimized the rewriting capabilities of materialized views, including:
- Supports rewrite of View Delta Join, Outer Join, and Cross Join.
- Optimized SQL rewrite of Union with partition.
Improved materialized view building capabilities: supporting CTE, select *, and Union.
Optimized the information returned by SHOW MATERIALIZED VIEWS.
Supports adding MV partitions in batches, which improves the efficiency of partition addition during materialized view building. #21167

Query engine

All operators are supported in the pipeline engine. Non-pipeline code will be removed in later versions.
Improved Big Query Positioning and added big query log. SHOW PROCESSLIST supports viewing CPU and memory information.
Optimized Outer Join Reorder.
Optimized error messages in the SQL parsing stage, providing more accurate error positioning and clearer error messages.

Data Lake Analytics

Optimized metadata statistics collection.
Supports using SHOW CREATE TABLE to view the creation statements of the tables that are managed by an external catalog and are stored in Apache Hive™, Apache Iceberg, Apache Hudi, or Delta Lake.

Bug Fixes

Some URLs in the license header of StarRocks' source file cannot be accessed. #2224
An unknown error is returned during SELECT queries. #19731
Supports SHOW/SET CHARACTER. #17480
When the loaded data exceeds the field length supported by StarRocks, the error message returned is not correct. #14
Supports show full fields from 'table'. #17233
Partition pruning causes MV rewrites to fail. #14641
MV rewrite fails when the CREATE MATERIALIZED VIEW statement contains count(distinct) and count(distinct) is applied to the DISTRIBUTED BY column. [#16558]...

Assets 2

04 May 08:02

wangsimo0

2.5.5

24c1eca

2.5.5

New features

Added a metric to monitor the tablet status of Primary Key tables:

Added the FE metric err_state_metric.
Added the ErrorStateTabletNum column to the output of SHOW PROC '/statistic/' to display the number of err_state tablets.
Added the ErrorStateTablets column to the output of SHOW PROC '/statistic/<db_id>/' to display the IDs of err_state tablets.
For more information, see SHOW PROC.

Improvements

Optimized the disk balancing speed when multiple BEs are added. # 19418
Optimized the inference of storage_medium. When BEs use both SSD and HDD as storage devices, if the property storage_cooldown_time is specified, StarRocks sets storage_medium to SSD. Otherwise, StarRocks sets storage_medium to HDD. #18649
Optimized the performance of Unique Key tables by forbidding the collection of statistics from value columns. #19563

Bug Fixes

For Colocation tables, the replica status can be manually specified as bad by using statements like ADMIN SET REPLICA STATUS PROPERTIES ("tablet_id" = "10003", "backend_id" = "10001", "status" = "bad");. If the number of BEs is less than or equal to the number of replicas, the corrupted replica cannot be repaired. # 17876
After a BE is started, its process exists but the BE port cannot be enabled. # 19347
Wrong results are returned for aggregate queries whose subquery is nested with a window function. # 19725
auto_refresh_partitions_limit does not take effect when the materialized view (MV) is refreshed for the first time. As a result, all the partitions are refreshed. # 19759
An error occurs when querying a CSV Hive external table whose array data is nested with complex data such as MAP and STRUCT. # 20233
Queries that use Spark connector time out. # 20264
If one replica of a two-replica table is corrupted, the table cannot recover. # 20681
Query failure caused by MV query rewrite failure. # 19549
The metric interface expires due to database lock. # 20790
Wrong results are returned for Broadcast Join. # 20952
NPE is returned when an unsupported data type is used in CREATE TABLE. # 20999
The issue caused by using window_funnel() with the Query Cache feature. # 21474
Optimization plan selection takes an unexpectedly long time after the CTE is rewritten. # 16515

Assets 2

Releases: StarRocks/starrocks

2.5.9

New features

Improvements

Bug Fixes

3.1.0-rc01

New Features

Shared-data cluster

Data Lake analytics

Storage engine, data ingestion, and query

SQL reference

Improvements

Shared-data cluster

Materialized view

Data Lake analytics

SQL reference

Bug Fixes

2.5.8

Improvements

Bug Fixes

3.0.3

Improvements

Bug Fixes

2.5.7

New features

Improvements

Bug Fixes

3.0.2

Improvements

Bug Fixes

3.0.1

New Features

Improvements

Bug Fixes

2.5.6

Improvements

Bug Fixes

3.0.0

New Features

System architecture

Storage engine and data ingestion

Data Lake Analytics

Privileges and security

Query engine

SQL reference

Improvements

Deployment

Storage engine and data ingestion

Materialized view

Query engine

Data Lake Analytics

Bug Fixes

2.5.5

New features

Improvements

Bug Fixes