-
Notifications
You must be signed in to change notification settings - Fork 3
Add deltalake source and sink for topsql #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 0.49
Are you sure you want to change the base?
Conversation
|
|
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
6532cd4 to
9dc07fd
Compare
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
| use std::path::PathBuf; | ||
|
|
||
| use vector::{ | ||
| aws::{AwsAuthentication, RegionOrEndpoint}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们现在只支持aws和alicloud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯 是的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta信息和data信息只是存储格式不同,但是mod.rs大部分是重复的,有必要分开吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯 好观点。目前分开的话好处在于,两部分的数据完全分开,这样两部分的 batch 不会相互影响,同时两部分实际处理逻辑还是不一样的,后续可以考虑把 不同云的 适配这部分逻辑单独提取出来,像 deltalake_writer 这样,这样两部分 mod 其实就基本是个壳子了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前版本的格式应该不用改吧,为什么要加入新的字段呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
更新了下,目前只保留了一些为了通过编译的改动,还有一些测试中的变动,原来 topsql 到 vm 的数据不会有任何变化
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>
Signed-off-by: yibin87 <[email protected]>

TopSQL v2 Source and Delta Lake Sinks Implementation
Overview
This PR implements a complete TopSQL data collection and storage solution, including:
Key Features
1. TopSQL v2 Source (
topsql_v2)Data Output
tidb_topsqltable containing SQL execution statisticstikv_topsqltable containing TiKV-level SQL statisticstikv_topregiontable containing Region-level statisticstopsql_sql_metatable containing SQL text and metadatatopsql_plan_metatable containing execution plan informationConfiguration Options
2. TopSQL Data Delta Lake Sink (
topsql_data_deltalake)Core Features
source_tablefield and routes data to corresponding Delta Lake tablesSupported Data Types
tidb_topsql: TiDB TopSQL datatikv_topsql: TiKV TopSQL datatikv_topregion: TiKV TopRegion dataSchema Definition
Contains complete TopSQL metric fields:
3. TopSQL Meta Delta Lake Sink (
topsql_meta_deltalake)Core Features
{digest}_{date}(e.g.,sql_digest_2024-01-01)max_delay_secs, default 180 seconds)EVENT_BUFFER_MAX_SIZE, default 1000)Supported Data Types
topsql_sql_meta: SQL metadata (normalized_sql, sql_digest, etc.)topsql_plan_meta: Plan metadata (normalized_plan, encoded_normalized_plan, etc.)Configuration Examples
Complete Configuration Example