-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[clickhouse] Introduce clients for write/read path #7082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[clickhouse] Introduce clients for write/read path #7082
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (47.56%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #7082 +/- ##
==========================================
- Coverage 96.14% 95.94% -0.20%
==========================================
Files 358 363 +5
Lines 21585 21667 +82
==========================================
+ Hits 20753 20789 +36
- Misses 627 672 +45
- Partials 205 206 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Username string `mapstructure:"username"` | ||
Password string `mapstructure:"password"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the only form of auth CH supports? What about TLS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both of them are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point was that we usually have auth:
section in the config with different options, plain text username/password is one of the options. Since you don't actually have code here that uses configuration to create CH connection objects it's hard to tell. Why not have Factory and Config in one PR, instead of reader/writer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the configuration to be extensible:
extensions:
jaeger_storage:
clickhouse:
servers: [...]
auth:
basic:
username:
password:
} | ||
|
||
// ClientConfig holds the client configuration used to connect to the server within the chpool. | ||
type ClientConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different from ClickhouseConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difference is that clickhouse-go supports load balancing and fault tolerance, while ch-go does not. When in use, clickhouse-go accepts an array of addresses, whereas ch-go only accepts a single address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, if nest them together, only use single-node mode here. If there's a need to use multi-node, then ch-go support would be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When in use, clickhouse-go accepts an array of addresses, whereas ch-go only accepts a single address.
This is an internal implementation issue, it doesn't mean the end user needs to be exposed to this dichotomy.
The main difference is that clickhouse-go supports load balancing and fault tolerance, while ch-go does not.
This sounds like it should rule out ch-go for us, does it not? Connecting to a single server isn't a fault tolerant setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, if the user writes only servers: ['localhost:9000']
in the configuration, then ch-go will also use it. However, if it's servers: [a:1000, b:2000, c:3000]
, then we randomly pick one to give to ch-go.
0333ce1
to
74255e2
Compare
func NewFactory(ctx context.Context, cfg Config, poolFn newPoolFn, connFn newConnFn) (*Factory, error) { | ||
f := &Factory{ | ||
cfg: cfg, | ||
poolFn: poolFn, | ||
connFn: connFn, | ||
} | ||
err := f.initConnections(ctx) | ||
return f, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult to test functions with the signature context.Context, cfg Config
because this function establishes a real connection with the ClickHouse server through initConnections
, which requires us to provide a real server environment. The initConnections function plays a crucial role in this process. It relies on accurate context and configuration information to connect to the server, and providing such a real - world environment for testing is challenging.
I've looked into the testing methods of other libraries. For example, clickhouse-go uses TestContainers to provide a real testing environment, and ch-go uses a real ClickHouse server in its E2E tests. However, implementing these methods in our testing scenario also presents certain difficulties.
Due to the above - mentioned testing challenges, it's not suitable to design the function signature as context.Context, cfg Config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can take inspiration from Elasticsearch. We can create client abstractions which can be used to mock the response from clients for code coverage in unit tests. The behaviour with real click-house server can be tested in E2E Tests by following the same steps as that of other storage. Regarding signature, something similar is happening in ES getClient()
function, similar approach can be employed here to remove poolFn and connFn form the function parameters. I hope this will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhengkezhou1 con't do connections in constructor, constructor may only validate parameters, but better when it doesn't return errors at all. Have a Start
function that actually tries to create connection. It can do that via functions captured in the struct by the constructor, which allows you to swap those functions in the unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @Manik2708 says, using an abstraction client makes testing easy. I think bringing back the wrapper is a good choice.
} | ||
|
||
// NewTraceReader creates a TraceReader instance, using the connection to get traces from ClickHouse. | ||
func NewTraceReader(conn client.Conn) *TraceReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constructor should be the first function in the file, after struct definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read more about chpool driver and I think we need to stay away from it, only use clickhouse-go. The fact that chpool does not accept multiple addresses is just a big bottleneck, the write path is the most intensive for traces but the driver provides no load balancing.
Username string `mapstructure:"username"` | ||
Password string `mapstructure:"password"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the configuration to be extensible:
extensions:
jaeger_storage:
clickhouse:
servers: [...]
auth:
basic:
username:
password:
OK, I will drop it in another PR. |
Signed-off-by: zhengkezhou1 <[email protected]>
Signed-off-by: zhengkezhou1 <[email protected]>
65a2616
to
24dddb0
Compare
Hey @zhengkezhou1! I'm jumping in to help drive the effort to bring ClickHouse support to Jaeger. Thanks so much for all your contributions so far! I had a chance to review the open PRs, and I think it would help to take a more incremental approach so we can keep the momentum going and make the review process smoother. To that end, I’ve scoped out the overall effort and broken it down into smaller sub-issues under the main tracking issue: #5058. To get started, it probably makes the most sense to begin at the foundation: implementing the Reader and Writer. Once we have those in place and in good shape, we can move on to the higher-level work like the factory, configuration, and documentation. If you're still interested in contributing, feel free to let me know which sub-issues you'd like to tackle. I’d be happy to coordinate and split up the work so we can move efficiently and avoid overlap. Maybe we can start by splitting the work for the reader and writer? Looking forward to collaborating! |
Thanks, I would be happy to collaborate with you. Let me work on the writer implementation ticket. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time. |
"context" | ||
) | ||
|
||
// Conn is an s abstraction of clickhouse.Conn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has a grammatical error. It should be "Conn is an abstraction of clickhouse.Conn" (without the extra 's'). This would make the interface definition clearer.
// Conn is an s abstraction of clickhouse.Conn | |
// Conn is an abstraction of clickhouse.Conn |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
conn client.Conn | ||
} | ||
|
||
// NewTraceWriter creates a TraceWriter instance, using connection poo to write traces to ClickHouse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the comment for NewTraceWriter()
. It should read "using connection pool" instead of "using connection poo".
// NewTraceWriter creates a TraceWriter instance, using connection poo to write traces to ClickHouse. | |
// NewTraceWriter creates a TraceWriter instance, using connection pool to write traces to ClickHouse. |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time. |
Which problem is this PR solving?
Description of the changes
chpool
for the write path and theclickhouse-go
client for the read path, using a factory to hold all resources.How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:npm run lint
andnpm run test