-
Notifications
You must be signed in to change notification settings - Fork 849
refactor(query): optimize system tables filter with lightweight permission check #19293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a9f115c324
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
a9f115c to
5625e24
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5625e2431b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
5625e24 to
d03f92a
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
72f7cc2 to
cfb2557
Compare
412688e to
f1f0f21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1f0f210f6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
f1f0f21 to
00f080d
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00f080dfdb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
00f080d to
c2dd253
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2dd253106
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c2dd253 to
0a48dfd
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
a16f8a4 to
187bcf9
Compare
drmingdrmer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drmingdrmer reviewed 10 files and all commit messages, and made 6 comments.
Reviewable status: 10 of 18 files reviewed, 8 unresolved discussions (waiting on @b41sh and @TCeason).
src/query/management/src/role/role_mgr.rs line 629 at r7 (raw file):
&self, objects: &[OwnershipObject], ) -> databend_common_exception::Result<Vec<Option<OwnershipInfo>>> {
Never use the ErrorCode as returning error unless there an option that is more specific.
src/query/management/src/role/role_mgr.rs line 668 at r7 (raw file):
warn!("Failed to deserialize ownership for key {}: {}", key, err); results.push(None); }
Why is such an error allowed to ignore? Deization failure should be considered as a a damaged data. It is a severe error.
Code quote:
Err(err) => {
// If deserialization fails, treat as not found
warn!("Failed to deserialize ownership for key {}: {}", key, err);
results.push(None);
}src/query/catalog/src/database.rs line 135 at r7 (raw file):
} Ok(tables) }
Does this function do as the doc comment said? for example, does it get tables in the batch?
I don't see there is a reason to fall back to a one by one get
Code quote:
/// Get multiple tables by names in batch.
/// Returns tables in the same order as input, skipping tables that are not found.
#[async_backtrace::framed]
async fn mget_tables(&self, table_names: &[String]) -> Result<Vec<Arc<dyn Table>>> {
// Default implementation: fall back to sequential get_table calls
let mut tables = Vec::with_capacity(table_names.len());
for table_name in table_names {
if let Ok(table) = self.get_table(table_name).await {
tables.push(table);
}
}
Ok(tables)
}src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):
if !tables.is_empty() { return Ok(tables); }
I see a several empty check in this pull request, and it doesn't provide any optimization since most of the time the the data is not empty. And we can just remove this check to make the code clean.
Code quote:
if !tables.is_empty() {
return Ok(tables);
}src/query/catalog/src/catalog/interface.rs line 316 at r7 (raw file):
} Ok(tables) }
I don't see why it provides a default implementation.
Code quote:
/// Get multiple tables by db and table names in batch.
/// Returns tables in the same order as the input table_names.
/// If a table is not found, it will not be included in the result.
async fn mget_tables(
&self,
tenant: &Tenant,
db_name: &str,
table_names: &[String],
) -> Result<Vec<Arc<dyn Table>>> {
// Default implementation: fall back to sequential get_table calls
let mut tables = Vec::with_capacity(table_names.len());
for table_name in table_names {
if let Ok(table) = self.get_table(tenant, db_name, table_name).await {
tables.push(table);
}
}
Ok(tables)
}src/meta/api/src/kv_fetch_util.rs line 171 at r7 (raw file):
let str_keys: Vec<String> = keys.iter().map(|k| k.to_string_key()).collect(); let seq_values = kv_api.mget_kv(&str_keys).await?;
use KvApiExt::get_kv_stream() instead. mget_kv() allocates another vec.
TCeason
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TCeason made 6 comments.
Reviewable status: 3 of 18 files reviewed, 8 unresolved discussions (waiting on @b41sh and @drmingdrmer).
src/meta/api/src/kv_fetch_util.rs line 171 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
use KvApiExt::get_kv_stream() instead. mget_kv() allocates another vec.
Done
src/query/catalog/src/database.rs line 135 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Does this function do as the doc comment said? for example, does it get tables in the batch?
I don't see there is a reason to fall back to a one by one get
trait Database has 5 impl. And only DefaultDatabase support mget. So I think the trait can one by one get. Now I add some comment.
src/query/catalog/src/catalog/interface.rs line 316 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
I don't see why it provides a default implementation.
trait Catalog has 9 impl. Only two can support mget. Now I add some comment.
src/query/management/src/role/role_mgr.rs line 629 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Never use the
ErrorCodeas returning error unless there an option that is more specific.
I have also noticed this. But perhaps a better approach would be to uniformly correct these error handling issues in another PR?
I create an issue to focus on solving them. #19308
src/query/management/src/role/role_mgr.rs line 668 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Why is such an error allowed to ignore? Deization failure should be considered as a a damaged data. It is a severe error.
Now it will return Err.
src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
I see a several empty check in this pull request, and it doesn't provide any optimization since most of the time the the data is not empty. And we can just remove this check to make the code clean.
Done. I removed some checks. But in this case, I think it's better to keep it as it is.
/// Get multiple tables by db_id and table names in batch.
/// Returns TableInfo for tables that exist, in the same order as input.
#[logcall::logcall]
#[fastrace::trace]
async fn mget_tables(
&self,
db_id: u64,
db_name: &str,
table_names: &[String],
) -> Result<Vec<Arc<TableInfo>>, KVAppError> {
debug! (db_id = db_id, table_names :? = table_names; "TableApi: {}", func_name! ());
if table_names.is_empty() {
return Ok(vec! []);
}
drmingdrmer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drmingdrmer reviewed 3 files and all commit messages, made 2 comments, and resolved 2 discussions.
Reviewable status: 6 of 18 files reviewed, 6 unresolved discussions (waiting on @b41sh and @TCeason).
src/query/catalog/src/database.rs line 135 at r7 (raw file):
Previously, TCeason wrote…
trait Database has 5 impl. And only DefaultDatabase support mget. So I think the trait can one by one get. Now I add some comment.
only DefaultDatabase support mget where is this assumption from?
src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):
Previously, TCeason wrote…
Done. I removed some checks. But in this case, I think it's better to keep it as it is.
/// Get multiple tables by db_id and table names in batch. /// Returns TableInfo for tables that exist, in the same order as input. #[logcall::logcall] #[fastrace::trace] async fn mget_tables( &self, db_id: u64, db_name: &str, table_names: &[String], ) -> Result<Vec<Arc<TableInfo>>, KVAppError> { debug! (db_id = db_id, table_names :? = table_names; "TableApi: {}", func_name! ()); if table_names.is_empty() { return Ok(vec! []); }
Remove all of them please. Otherwise, please estimate the chance that the input tables is empty. And estimate the performance gain, in number.
TCeason
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TCeason made 2 comments.
Reviewable status: 5 of 18 files reviewed, 6 unresolved discussions (waiting on @b41sh and @drmingdrmer).
src/query/catalog/src/database.rs line 135 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
only DefaultDatabase support mgetwhere is this assumption from?
e.g. In hive catalog
#[async_backtrace::framed]
async fn list_tables(&self) -> Result<Vec<Arc<dyn Table>>> {
let table_names = self
.ctl
.iceberg_catalog()
.list_tables(&self.ident)
.await
.map_err(|err| {
ErrorCode::UnknownException(format!("Iceberg list tables failed: {err:?}"))
})?;
let mut tables = vec![];
for table_name in table_names {
let table = self.get_table(&table_name.name).await?;
tables.push(table);
}
Ok(tables)
}In Iceberg also use for ... get_table
#[async_backtrace::framed]
async fn list_tables(&self) -> Result<Vec<Arc<dyn Table>>> {
let table_names = self
.ctl
.iceberg_catalog()
.list_tables(&self.ident)
.await
.map_err(|err| {
ErrorCode::UnknownException(format!("Iceberg list tables failed: {err:?}"))
})?;
let mut tables = vec![];
for table_name in table_names {
let table = self.get_table(&table_name.name).await?;
tables.push(table);
}
Ok(tables)
}The list table also for ... get_table.
src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Remove all of them please. Otherwise, please estimate the chance that the input tables is empty. And estimate the performance gain, in number.
Yes. In the code, the probability of "empty" is 0. It has been completely deleted.
682184d to
559d4ee
Compare
🤖 CI Job Analysis (Retry 1)
📊 Summary
❌ NO RETRY NEEDEDAll failures appear to be code/test issues requiring manual fixes. 🔍 Job Details
🤖 AboutAutomated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed). |
drmingdrmer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drmingdrmer reviewed 4 files and all commit messages, and resolved 1 discussion.
Reviewable status: 8 of 20 files reviewed, 5 unresolved discussions (waiting on @b41sh and @TCeason).
48fbbdb to
a56f5a3
Compare
a56f5a3 to
5ad312b
Compare
drmingdrmer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drmingdrmer reviewed 8 files and all commit messages, and resolved 2 discussions.
Reviewable status: 14 of 26 files reviewed, 3 unresolved discussions (waiting on @b41sh and @TCeason).
… semantics Remove redundant is_empty check and get_database verification. When mget_tables returns Some, the database exists in immutable catalog.
TCeason
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TCeason made 1 comment.
Reviewable status: 14 of 26 files reviewed, 3 unresolved discussions (waiting on @b41sh and @drmingdrmer).
src/query/service/src/catalogs/default/database_catalog.rs line 428 at r7 (raw file):
Previously, TCeason wrote…
Yes. In the code, the probability of "empty" is 0. It has been completely deleted.
Done
|
wait #19310 merge |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
system.tables,system.columns) query performance when filtering by specific database/table namesResults Summary (Main vs Branch)
Test SQL
Notes
Tests
Type of change
This change is