refactor(query): optimize system tables filter with lightweight permission check #19293

TCeason · 2026-01-19T11:26:41Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Optimize system tables (system.tables, system.columns) query performance when filtering by specific database/table names
Add lightweight permission check path that avoids loading all ownerships when querying specific objects
Refactor system table query logic with cleaner function separation

Results Summary (Main vs Branch)

Query	Main	Branch	Speedup
system.columns (db+table filter)	5.616 s	0.121 s	~46×
system.tables (db+name filter)	5.711 s	0.058 s	~98×

Test SQL

  select *
  from system.columns
  where database='bench_db2' and table='t_10000';

  select count()
  from system.tables
  where database='bench_db2' and name='t_10000';

Notes

Non‑admin user (u1) with ~500k ownership keys.
Filters use exact database + table/name equality to hit the optimized path.
The branch skips full list_ownerships and performs lightweight checks, which is why the improvement is dramatic at scale.

Tests

Unit Test
Logic Test
Benchmark Test
No Test

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9f115c324

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/system/src/columns_table.rs

TCeason · 2026-01-19T11:57:32Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5625e2431b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/system/src/databases_table.rs

src/query/storages/system/src/tables_table.rs

TCeason · 2026-01-19T13:07:30Z

@codex review

chatgpt-codex-connector · 2026-01-19T13:17:05Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/system/src/columns_table.rs

src/query/users/src/visibility_checker.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1f0f210f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/system/src/util.rs

TCeason · 2026-01-20T10:57:00Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00f080dfdb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/system/src/util.rs

TCeason · 2026-01-20T11:12:58Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2dd253106

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/system/src/util.rs

TCeason · 2026-01-20T11:42:24Z

@codex review

chatgpt-codex-connector · 2026-01-20T11:57:47Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

drmingdrmer

@drmingdrmer reviewed 10 files and all commit messages, and made 6 comments.
Reviewable status: 10 of 18 files reviewed, 8 unresolved discussions (waiting on @b41sh and @TCeason).

src/query/management/src/role/role_mgr.rs line 629 at r7 (raw file):

        &self,
        objects: &[OwnershipObject],
    ) -> databend_common_exception::Result<Vec<Option<OwnershipInfo>>> {

Never use the ErrorCode as returning error unless there an option that is more specific.

src/query/management/src/role/role_mgr.rs line 668 at r7 (raw file):

                            warn!("Failed to deserialize ownership for key {}: {}", key, err);
                            results.push(None);
                        }

Why is such an error allowed to ignore? Deization failure should be considered as a a damaged data. It is a severe error.

Code quote:

                        Err(err) => {
                            // If deserialization fails, treat as not found
                            warn!("Failed to deserialize ownership for key {}: {}", key, err);
                            results.push(None);
                        }

src/query/catalog/src/database.rs line 135 at r7 (raw file):

        }
        Ok(tables)
    }

Does this function do as the doc comment said? for example, does it get tables in the batch?

I don't see there is a reason to fall back to a one by one get

Code quote:

    /// Get multiple tables by names in batch.
    /// Returns tables in the same order as input, skipping tables that are not found.
    #[async_backtrace::framed]
    async fn mget_tables(&self, table_names: &[String]) -> Result<Vec<Arc<dyn Table>>> {
        // Default implementation: fall back to sequential get_table calls
        let mut tables = Vec::with_capacity(table_names.len());
        for table_name in table_names {
            if let Ok(table) = self.get_table(table_name).await {
                tables.push(table);
            }
        }
        Ok(tables)
    }

src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

            if !tables.is_empty() {
                return Ok(tables);
            }

I see a several empty check in this pull request, and it doesn't provide any optimization since most of the time the the data is not empty. And we can just remove this check to make the code clean.

Code quote:

            if !tables.is_empty() {
                return Ok(tables);
            }

src/query/catalog/src/catalog/interface.rs line 316 at r7 (raw file):

        }
        Ok(tables)
    }

I don't see why it provides a default implementation.

Code quote:

    /// Get multiple tables by db and table names in batch.
    /// Returns tables in the same order as the input table_names.
    /// If a table is not found, it will not be included in the result.
    async fn mget_tables(
        &self,
        tenant: &Tenant,
        db_name: &str,
        table_names: &[String],
    ) -> Result<Vec<Arc<dyn Table>>> {
        // Default implementation: fall back to sequential get_table calls
        let mut tables = Vec::with_capacity(table_names.len());
        for table_name in table_names {
            if let Ok(table) = self.get_table(tenant, db_name, table_name).await {
                tables.push(table);
            }
        }
        Ok(tables)
    }

src/meta/api/src/kv_fetch_util.rs line 171 at r7 (raw file):

    let str_keys: Vec<String> = keys.iter().map(|k| k.to_string_key()).collect();
    let seq_values = kv_api.mget_kv(&str_keys).await?;

use KvApiExt::get_kv_stream() instead. mget_kv() allocates another vec.

TCeason

@TCeason made 6 comments.
Reviewable status: 3 of 18 files reviewed, 8 unresolved discussions (waiting on @b41sh and @drmingdrmer).

src/meta/api/src/kv_fetch_util.rs line 171 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

use KvApiExt::get_kv_stream() instead. mget_kv() allocates another vec.

Done

src/query/catalog/src/database.rs line 135 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Does this function do as the doc comment said? for example, does it get tables in the batch?

I don't see there is a reason to fall back to a one by one get

trait Database has 5 impl. And only DefaultDatabase support mget. So I think the trait can one by one get. Now I add some comment.

src/query/catalog/src/catalog/interface.rs line 316 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

I don't see why it provides a default implementation.

trait Catalog has 9 impl. Only two can support mget. Now I add some comment.

src/query/management/src/role/role_mgr.rs line 629 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Never use the ErrorCode as returning error unless there an option that is more specific.

I have also noticed this. But perhaps a better approach would be to uniformly correct these error handling issues in another PR?

I create an issue to focus on solving them. #19308

src/query/management/src/role/role_mgr.rs line 668 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Why is such an error allowed to ignore? Deization failure should be considered as a a damaged data. It is a severe error.

Now it will return Err.

src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

I see a several empty check in this pull request, and it doesn't provide any optimization since most of the time the the data is not empty. And we can just remove this check to make the code clean.

Done. I removed some checks. But in this case, I think it's better to keep it as it is.

/// Get multiple tables by db_id and table names in batch.
/// Returns TableInfo for tables that exist, in the same order as input.
#[logcall::logcall]
#[fastrace::trace]
async fn mget_tables(
&self,
db_id: u64,
db_name: &str,
table_names: &[String],
) -> Result<Vec<Arc<TableInfo>>, KVAppError> {
debug! (db_id = db_id, table_names :? = table_names; "TableApi: {}", func_name! ());
if table_names.is_empty() {
return Ok(vec! []);
}

drmingdrmer

@drmingdrmer reviewed 3 files and all commit messages, made 2 comments, and resolved 2 discussions.
Reviewable status: 6 of 18 files reviewed, 6 unresolved discussions (waiting on @b41sh and @TCeason).

src/query/catalog/src/database.rs line 135 at r7 (raw file):

Previously, TCeason wrote…

trait Database has 5 impl. And only DefaultDatabase support mget. So I think the trait can one by one get. Now I add some comment.

only DefaultDatabase support mget where is this assumption from?

src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, TCeason wrote…

Done. I removed some checks. But in this case, I think it's better to keep it as it is.

/// Get multiple tables by db_id and table names in batch.
/// Returns TableInfo for tables that exist, in the same order as input.
#[logcall::logcall]
#[fastrace::trace]
async fn mget_tables(
&self,
db_id: u64,
db_name: &str,
table_names: &[String],
) -> Result<Vec<Arc<TableInfo>>, KVAppError> {
debug! (db_id = db_id, table_names :? = table_names; "TableApi: {}", func_name! ());
if table_names.is_empty() {
return Ok(vec! []);
}

Remove all of them please. Otherwise, please estimate the chance that the input tables is empty. And estimate the performance gain, in number.

TCeason

@TCeason made 2 comments.
Reviewable status: 5 of 18 files reviewed, 6 unresolved discussions (waiting on @b41sh and @drmingdrmer).

src/query/catalog/src/database.rs line 135 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

only DefaultDatabase support mget where is this assumption from?

e.g. In hive catalog

https://github.com/datafuselabs/databend/blob/945df940312d60b7013658d145e6981b4b44eaea/src/query/storages/hive/hive/src/hive_catalog.rs#L471

#[async_backtrace::framed]
    async fn list_tables(&self) -> Result<Vec<Arc<dyn Table>>> {
        let table_names = self
            .ctl
            .iceberg_catalog()
            .list_tables(&self.ident)
            .await
            .map_err(|err| {
                ErrorCode::UnknownException(format!("Iceberg list tables failed: {err:?}"))
            })?;

        let mut tables = vec![];

        for table_name in table_names {
            let table = self.get_table(&table_name.name).await?;
            tables.push(table);
        }
        Ok(tables)
    }

In Iceberg also use for ... get_table

#[async_backtrace::framed]
    async fn list_tables(&self) -> Result<Vec<Arc<dyn Table>>> {
        let table_names = self
            .ctl
            .iceberg_catalog()
            .list_tables(&self.ident)
            .await
            .map_err(|err| {
                ErrorCode::UnknownException(format!("Iceberg list tables failed: {err:?}"))
            })?;

        let mut tables = vec![];

        for table_name in table_names {
            let table = self.get_table(&table_name.name).await?;
            tables.push(table);
        }
        Ok(tables)
    }

The list table also for ... get_table.

src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Remove all of them please. Otherwise, please estimate the chance that the input tables is empty. And estimate the performance gain, in number.

Yes. In the code, the probability of "empty" is 0. It has been completely deleted.

…n check

github-actions · 2026-01-21T11:14:31Z

🤖 CI Job Analysis (Retry 1)

Workflow: 21236971674

📊 Summary

Total Jobs: 84
Failed Jobs: 4
Retryable: 0
Code Issues: 4

❌ NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

❌ linux / sqllogic / standalone_minio (query, hybrid, parquet): Not retryable (Code/Test)
❌ linux / sqllogic / standalone_minio (query, hybrid, native): Not retryable (Code/Test)
❌ linux / sqllogic / standalone (query, 4c, hybrid): Not retryable (Code/Test)
❌ linux / sqllogic / cluster (query, 4c, hybrid): Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

drmingdrmer

@drmingdrmer reviewed 4 files and all commit messages, and resolved 1 discussion.
Reviewable status: 8 of 20 files reviewed, 5 unresolved discussions (waiting on @b41sh and @TCeason).

drmingdrmer

@drmingdrmer reviewed 8 files and all commit messages, and resolved 2 discussions.
Reviewable status: 14 of 26 files reviewed, 3 unresolved discussions (waiting on @b41sh and @TCeason).

… semantics Remove redundant is_empty check and get_database verification. When mget_tables returns Some, the database exists in immutable catalog.

TCeason

@TCeason made 1 comment.
Reviewable status: 14 of 26 files reviewed, 3 unresolved discussions (waiting on @b41sh and @drmingdrmer).

src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, TCeason wrote…

Yes. In the code, the probability of "empty" is 0. It has been completely deleted.

Done

TCeason · 2026-01-22T07:58:22Z

wait #19310 merge

TCeason marked this pull request as draft January 19, 2026 11:26

github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jan 19, 2026

chatgpt-codex-connector bot reviewed Jan 19, 2026

View reviewed changes

src/query/storages/system/src/columns_table.rs Outdated Show resolved Hide resolved

TCeason force-pushed the system_filter_optimize branch from a9f115c to 5625e24 Compare January 19, 2026 11:57

chatgpt-codex-connector bot reviewed Jan 19, 2026

View reviewed changes

src/query/storages/system/src/databases_table.rs Outdated Show resolved Hide resolved

src/query/storages/system/src/tables_table.rs Outdated Show resolved Hide resolved

TCeason force-pushed the system_filter_optimize branch from 5625e24 to d03f92a Compare January 19, 2026 12:43

TCeason force-pushed the system_filter_optimize branch 2 times, most recently from 72f7cc2 to cfb2557 Compare January 20, 2026 03:44

TCeason marked this pull request as ready for review January 20, 2026 03:44

TCeason requested a review from b41sh January 20, 2026 03:56

b41sh reviewed Jan 20, 2026

View reviewed changes

src/query/storages/system/src/columns_table.rs Outdated Show resolved Hide resolved

src/query/users/src/visibility_checker.rs Show resolved Hide resolved

TCeason marked this pull request as draft January 20, 2026 09:59

TCeason force-pushed the system_filter_optimize branch from 412688e to f1f0f21 Compare January 20, 2026 10:21

TCeason marked this pull request as ready for review January 20, 2026 10:23

chatgpt-codex-connector bot reviewed Jan 20, 2026

View reviewed changes

src/query/storages/system/src/util.rs Outdated Show resolved Hide resolved

TCeason force-pushed the system_filter_optimize branch from f1f0f21 to 00f080d Compare January 20, 2026 10:56

chatgpt-codex-connector bot reviewed Jan 20, 2026

View reviewed changes

src/query/storages/system/src/util.rs Outdated Show resolved Hide resolved

TCeason force-pushed the system_filter_optimize branch from 00f080d to c2dd253 Compare January 20, 2026 11:12

chatgpt-codex-connector bot reviewed Jan 20, 2026

View reviewed changes

src/query/storages/system/src/util.rs Show resolved Hide resolved

TCeason force-pushed the system_filter_optimize branch from c2dd253 to 0a48dfd Compare January 20, 2026 11:42

TCeason force-pushed the system_filter_optimize branch 2 times, most recently from a16f8a4 to 187bcf9 Compare January 20, 2026 12:55

b41sh approved these changes Jan 21, 2026

View reviewed changes

drmingdrmer requested changes Jan 21, 2026

View reviewed changes

TCeason commented Jan 21, 2026

View reviewed changes

TCeason requested a review from drmingdrmer January 21, 2026 05:26

drmingdrmer requested changes Jan 21, 2026

View reviewed changes

TCeason commented Jan 21, 2026

View reviewed changes

TCeason added 10 commits January 21, 2026 17:42

optimize system tables filter

c22641a

optimize

601d746

refactor

6b9a845

refactor

abc97c6

feat(query): optimize system tables filter with lightweight permissio…

89d0bcf

…n check

add test

3011d5a

add mget_tables|ownerships

1a12b64

fix conversation

daeba7f

remove all table_names empty check

8c1ac54

resolve conflicts

559d4ee

TCeason force-pushed the system_filter_optimize branch from 682184d to 559d4ee Compare January 21, 2026 10:25

TCeason requested a review from drmingdrmer January 21, 2026 10:25

drmingdrmer requested changes Jan 21, 2026

View reviewed changes

support mget_dbs

67177f1

TCeason force-pushed the system_filter_optimize branch 4 times, most recently from 48fbbdb to a56f5a3 Compare January 22, 2026 01:47

impl mget_tables in Catalog and database

5ad312b

TCeason force-pushed the system_filter_optimize branch from a56f5a3 to 5ad312b Compare January 22, 2026 04:14

drmingdrmer reviewed Jan 22, 2026

View reviewed changes

refactor(catalog): simplify mget_tables by trusting immutable catalog…

97f6eb1

… semantics Remove redundant is_empty check and get_database verification. When mget_tables returns Some, the database exists in immutable catalog.

TCeason commented Jan 22, 2026

View reviewed changes

refactor(query): optimize system tables filter with lightweight permission check #19293

Are you sure you want to change the base?

refactor(query): optimize system tables filter with lightweight permission check #19293

Conversation

TCeason commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results Summary (Main vs Branch)

Test SQL

Notes

Tests

Type of change

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

TCeason commented Jan 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

TCeason commented Jan 19, 2026

Uh oh!

chatgpt-codex-connector bot commented Jan 19, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

TCeason commented Jan 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

TCeason commented Jan 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

TCeason commented Jan 20, 2026

Uh oh!

chatgpt-codex-connector bot commented Jan 20, 2026

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

TCeason left a comment

Choose a reason for hiding this comment

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

TCeason left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 CI Job Analysis (Retry 1)

📊 Summary

❌ NO RETRY NEEDED

🔍 Job Details

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

TCeason commented Jan 19, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading