Skip to content

Conversation

@TCeason
Copy link
Collaborator

@TCeason TCeason commented Jan 19, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Optimize system tables (system.tables, system.columns) query performance when filtering by specific database/table names
  • Add lightweight permission check path that avoids loading all ownerships when querying specific objects
  • Refactor system table query logic with cleaner function separation

Results Summary (Main vs Branch)

Query Main Branch Speedup
system.columns (db+table filter) 5.616 s 0.121 s ~46×
system.tables (db+name filter) 5.711 s 0.058 s ~98×

Test SQL

  select *
  from system.columns
  where database='bench_db2' and table='t_10000';

  select count()
  from system.tables
  where database='bench_db2' and name='t_10000';

Notes

  • Non‑admin user (u1) with ~500k ownership keys.
  • Filters use exact database + table/name equality to hit the optimized path.
  • The branch skips full list_ownerships and performs lightweight checks, which is why the improvement is dramatic at scale.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@TCeason TCeason marked this pull request as draft January 19, 2026 11:26
@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jan 19, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9f115c324

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch from a9f115c to 5625e24 Compare January 19, 2026 11:57
@TCeason
Copy link
Collaborator Author

TCeason commented Jan 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5625e2431b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch from 5625e24 to d03f92a Compare January 19, 2026 12:43
@TCeason
Copy link
Collaborator Author

TCeason commented Jan 19, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch 2 times, most recently from 72f7cc2 to cfb2557 Compare January 20, 2026 03:44
@TCeason TCeason marked this pull request as ready for review January 20, 2026 03:44
@TCeason TCeason requested a review from b41sh January 20, 2026 03:56
@TCeason TCeason marked this pull request as draft January 20, 2026 09:59
@TCeason TCeason force-pushed the system_filter_optimize branch from 412688e to f1f0f21 Compare January 20, 2026 10:21
@TCeason TCeason marked this pull request as ready for review January 20, 2026 10:23
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1f0f210f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch from f1f0f21 to 00f080d Compare January 20, 2026 10:56
@TCeason
Copy link
Collaborator Author

TCeason commented Jan 20, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00f080dfdb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch from 00f080d to c2dd253 Compare January 20, 2026 11:12
@TCeason
Copy link
Collaborator Author

TCeason commented Jan 20, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2dd253106

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch from c2dd253 to 0a48dfd Compare January 20, 2026 11:42
@TCeason
Copy link
Collaborator Author

TCeason commented Jan 20, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TCeason TCeason force-pushed the system_filter_optimize branch 2 times, most recently from a16f8a4 to 187bcf9 Compare January 20, 2026 12:55
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 10 files and all commit messages, and made 6 comments.
Reviewable status: 10 of 18 files reviewed, 8 unresolved discussions (waiting on @b41sh and @TCeason).


src/query/management/src/role/role_mgr.rs line 629 at r7 (raw file):

        &self,
        objects: &[OwnershipObject],
    ) -> databend_common_exception::Result<Vec<Option<OwnershipInfo>>> {

Never use the ErrorCode as returning error unless there an option that is more specific.


src/query/management/src/role/role_mgr.rs line 668 at r7 (raw file):

                            warn!("Failed to deserialize ownership for key {}: {}", key, err);
                            results.push(None);
                        }

Why is such an error allowed to ignore? Deization failure should be considered as a a damaged data. It is a severe error.

Code quote:

                        Err(err) => {
                            // If deserialization fails, treat as not found
                            warn!("Failed to deserialize ownership for key {}: {}", key, err);
                            results.push(None);
                        }

src/query/catalog/src/database.rs line 135 at r7 (raw file):

        }
        Ok(tables)
    }

Does this function do as the doc comment said? for example, does it get tables in the batch?

I don't see there is a reason to fall back to a one by one get

Code quote:

    /// Get multiple tables by names in batch.
    /// Returns tables in the same order as input, skipping tables that are not found.
    #[async_backtrace::framed]
    async fn mget_tables(&self, table_names: &[String]) -> Result<Vec<Arc<dyn Table>>> {
        // Default implementation: fall back to sequential get_table calls
        let mut tables = Vec::with_capacity(table_names.len());
        for table_name in table_names {
            if let Ok(table) = self.get_table(table_name).await {
                tables.push(table);
            }
        }
        Ok(tables)
    }

src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

            if !tables.is_empty() {
                return Ok(tables);
            }

I see a several empty check in this pull request, and it doesn't provide any optimization since most of the time the the data is not empty. And we can just remove this check to make the code clean.

Code quote:

            if !tables.is_empty() {
                return Ok(tables);
            }

src/query/catalog/src/catalog/interface.rs line 316 at r7 (raw file):

        }
        Ok(tables)
    }

I don't see why it provides a default implementation.

Code quote:

    /// Get multiple tables by db and table names in batch.
    /// Returns tables in the same order as the input table_names.
    /// If a table is not found, it will not be included in the result.
    async fn mget_tables(
        &self,
        tenant: &Tenant,
        db_name: &str,
        table_names: &[String],
    ) -> Result<Vec<Arc<dyn Table>>> {
        // Default implementation: fall back to sequential get_table calls
        let mut tables = Vec::with_capacity(table_names.len());
        for table_name in table_names {
            if let Ok(table) = self.get_table(tenant, db_name, table_name).await {
                tables.push(table);
            }
        }
        Ok(tables)
    }

src/meta/api/src/kv_fetch_util.rs line 171 at r7 (raw file):

    let str_keys: Vec<String> = keys.iter().map(|k| k.to_string_key()).collect();
    let seq_values = kv_api.mget_kv(&str_keys).await?;

use KvApiExt::get_kv_stream() instead. mget_kv() allocates another vec.

Copy link
Collaborator Author

@TCeason TCeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TCeason made 6 comments.
Reviewable status: 3 of 18 files reviewed, 8 unresolved discussions (waiting on @b41sh and @drmingdrmer).


src/meta/api/src/kv_fetch_util.rs line 171 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

use KvApiExt::get_kv_stream() instead. mget_kv() allocates another vec.

Done


src/query/catalog/src/database.rs line 135 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Does this function do as the doc comment said? for example, does it get tables in the batch?

I don't see there is a reason to fall back to a one by one get

trait Database has 5 impl. And only DefaultDatabase support mget. So I think the trait can one by one get. Now I add some comment.


src/query/catalog/src/catalog/interface.rs line 316 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

I don't see why it provides a default implementation.

trait Catalog has 9 impl. Only two can support mget. Now I add some comment.


src/query/management/src/role/role_mgr.rs line 629 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Never use the ErrorCode as returning error unless there an option that is more specific.

I have also noticed this. But perhaps a better approach would be to uniformly correct these error handling issues in another PR?

I create an issue to focus on solving them. #19308


src/query/management/src/role/role_mgr.rs line 668 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Why is such an error allowed to ignore? Deization failure should be considered as a a damaged data. It is a severe error.

Now it will return Err.


src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

I see a several empty check in this pull request, and it doesn't provide any optimization since most of the time the the data is not empty. And we can just remove this check to make the code clean.

Done. I removed some checks. But in this case, I think it's better to keep it as it is.

/// Get multiple tables by db_id and table names in batch.
/// Returns TableInfo for tables that exist, in the same order as input.
#[logcall::logcall]
#[fastrace::trace]
async fn mget_tables(
&self,
db_id: u64,
db_name: &str,
table_names: &[String],
) -> Result<Vec<Arc<TableInfo>>, KVAppError> {
debug! (db_id = db_id, table_names :? = table_names; "TableApi: {}", func_name! ());
if table_names.is_empty() {
return Ok(vec! []);
}

@TCeason TCeason requested a review from drmingdrmer January 21, 2026 05:26
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 3 files and all commit messages, made 2 comments, and resolved 2 discussions.
Reviewable status: 6 of 18 files reviewed, 6 unresolved discussions (waiting on @b41sh and @TCeason).


src/query/catalog/src/database.rs line 135 at r7 (raw file):

Previously, TCeason wrote…

trait Database has 5 impl. And only DefaultDatabase support mget. So I think the trait can one by one get. Now I add some comment.

only DefaultDatabase support mget where is this assumption from?


src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, TCeason wrote…

Done. I removed some checks. But in this case, I think it's better to keep it as it is.

/// Get multiple tables by db_id and table names in batch.
/// Returns TableInfo for tables that exist, in the same order as input.
#[logcall::logcall]
#[fastrace::trace]
async fn mget_tables(
&self,
db_id: u64,
db_name: &str,
table_names: &[String],
) -> Result<Vec<Arc<TableInfo>>, KVAppError> {
debug! (db_id = db_id, table_names :? = table_names; "TableApi: {}", func_name! ());
if table_names.is_empty() {
return Ok(vec! []);
}

Remove all of them please. Otherwise, please estimate the chance that the input tables is empty. And estimate the performance gain, in number.

Copy link
Collaborator Author

@TCeason TCeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TCeason made 2 comments.
Reviewable status: 5 of 18 files reviewed, 6 unresolved discussions (waiting on @b41sh and @drmingdrmer).


src/query/catalog/src/database.rs line 135 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

only DefaultDatabase support mget where is this assumption from?

e.g. In hive catalog

https://github.com/datafuselabs/databend/blob/945df940312d60b7013658d145e6981b4b44eaea/src/query/storages/hive/hive/src/hive_catalog.rs#L471

#[async_backtrace::framed]
    async fn list_tables(&self) -> Result<Vec<Arc<dyn Table>>> {
        let table_names = self
            .ctl
            .iceberg_catalog()
            .list_tables(&self.ident)
            .await
            .map_err(|err| {
                ErrorCode::UnknownException(format!("Iceberg list tables failed: {err:?}"))
            })?;

        let mut tables = vec![];

        for table_name in table_names {
            let table = self.get_table(&table_name.name).await?;
            tables.push(table);
        }
        Ok(tables)
    }

In Iceberg also use for ... get_table

#[async_backtrace::framed]
    async fn list_tables(&self) -> Result<Vec<Arc<dyn Table>>> {
        let table_names = self
            .ctl
            .iceberg_catalog()
            .list_tables(&self.ident)
            .await
            .map_err(|err| {
                ErrorCode::UnknownException(format!("Iceberg list tables failed: {err:?}"))
            })?;

        let mut tables = vec![];

        for table_name in table_names {
            let table = self.get_table(&table_name.name).await?;
            tables.push(table);
        }
        Ok(tables)
    }

The list table also for ... get_table.


src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Remove all of them please. Otherwise, please estimate the chance that the input tables is empty. And estimate the performance gain, in number.

Yes. In the code, the probability of "empty" is 0. It has been completely deleted.

@TCeason TCeason force-pushed the system_filter_optimize branch from 682184d to 559d4ee Compare January 21, 2026 10:25
@TCeason TCeason requested a review from drmingdrmer January 21, 2026 10:25
@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2026

🤖 CI Job Analysis (Retry 1)

Workflow: 21236971674

📊 Summary

  • Total Jobs: 84
  • Failed Jobs: 4
  • Retryable: 0
  • Code Issues: 4

NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

  • linux / sqllogic / standalone_minio (query, hybrid, parquet): Not retryable (Code/Test)
  • linux / sqllogic / standalone_minio (query, hybrid, native): Not retryable (Code/Test)
  • linux / sqllogic / standalone (query, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (query, 4c, hybrid): Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 4 files and all commit messages, and resolved 1 discussion.
Reviewable status: 8 of 20 files reviewed, 5 unresolved discussions (waiting on @b41sh and @TCeason).

@TCeason TCeason force-pushed the system_filter_optimize branch 4 times, most recently from 48fbbdb to a56f5a3 Compare January 22, 2026 01:47
@TCeason TCeason force-pushed the system_filter_optimize branch from a56f5a3 to 5ad312b Compare January 22, 2026 04:14
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 8 files and all commit messages, and resolved 2 discussions.
Reviewable status: 14 of 26 files reviewed, 3 unresolved discussions (waiting on @b41sh and @TCeason).

… semantics

  Remove redundant is_empty check and get_database verification.
  When mget_tables returns Some, the database exists in immutable catalog.
Copy link
Collaborator Author

@TCeason TCeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TCeason made 1 comment.
Reviewable status: 14 of 26 files reviewed, 3 unresolved discussions (waiting on @b41sh and @drmingdrmer).


src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):

Previously, TCeason wrote…

Yes. In the code, the probability of "empty" is 0. It has been completely deleted.

Done

@TCeason
Copy link
Collaborator Author

TCeason commented Jan 22, 2026

wait #19310 merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-refactor this PR changes the code base without new features or bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants