Skip to content

Conversation

zhengkezhou1
Copy link
Contributor

@zhengkezhou1 zhengkezhou1 commented Apr 28, 2025

Which problem is this PR solving?

Description of the changes

  • Introduce chpool for the write path and the clickhouse-go client for the read path, using a factory to hold all resources.

How was this change tested?

  • unit tests

Checklist

@zhengkezhou1 zhengkezhou1 requested a review from a team as a code owner April 28, 2025 15:51
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

❌ Patch coverage is 47.56098% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.94%. Comparing base (70e6a30) to head (24dddb0).
⚠️ Report is 243 commits behind head on main.

Files with missing lines Patch % Lines
internal/storage/v2/clickhouse/wrapper/wrapper.go 0.00% 31 Missing ⚠️
...nternal/storage/v2/clickhouse/tracestore/reader.go 28.57% 10 Missing ⚠️
...nternal/storage/v2/clickhouse/tracestore/writer.go 66.66% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (47.56%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7082      +/-   ##
==========================================
- Coverage   96.14%   95.94%   -0.20%     
==========================================
  Files         358      363       +5     
  Lines       21585    21667      +82     
==========================================
+ Hits        20753    20789      +36     
- Misses        627      672      +45     
- Partials      205      206       +1     
Flag Coverage Δ
badger_v1 9.91% <ø> (ø)
badger_v2 2.05% <ø> (ø)
cassandra-4.x-v1-manual 14.91% <ø> (ø)
cassandra-4.x-v2-auto 2.04% <ø> (ø)
cassandra-4.x-v2-manual 2.04% <ø> (ø)
cassandra-5.x-v1-manual 14.91% <ø> (ø)
cassandra-5.x-v2-auto 2.04% <ø> (ø)
cassandra-5.x-v2-manual 2.04% <ø> (ø)
elasticsearch-6.x-v1 20.17% <ø> (ø)
elasticsearch-7.x-v1 20.25% <ø> (ø)
elasticsearch-8.x-v1 20.43% <ø> (ø)
elasticsearch-8.x-v2 2.05% <ø> (ø)
grpc_v1 11.46% <ø> (ø)
grpc_v2 2.05% <ø> (ø)
kafka-3.x-v1 10.19% <ø> (ø)
kafka-3.x-v2 2.05% <ø> (ø)
memory_v2 2.05% <ø> (ø)
opensearch-1.x-v1 20.30% <ø> (ø)
opensearch-2.x-v1 20.30% <ø> (ø)
opensearch-2.x-v2 2.05% <ø> (ø)
query 2.05% <ø> (ø)
tailsampling-processor 0.55% <ø> (ø)
unittests 94.76% <47.56%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 31 to 38
Username string `mapstructure:"username"`
Password string `mapstructure:"password"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only form of auth CH supports? What about TLS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both of them are supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point was that we usually have auth: section in the config with different options, plain text username/password is one of the options. Since you don't actually have code here that uses configuration to create CH connection objects it's hard to tell. Why not have Factory and Config in one PR, instead of reader/writer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the configuration to be extensible:

extensions:
  jaeger_storage:
    clickhouse:
      servers: [...]
      auth:
        basic:
          username:
          password:

}

// ClientConfig holds the client configuration used to connect to the server within the chpool.
type ClientConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from ClickhouseConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference is that clickhouse-go supports load balancing and fault tolerance, while ch-go does not. When in use, clickhouse-go accepts an array of addresses, whereas ch-go only accepts a single address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, if nest them together, only use single-node mode here. If there's a need to use multi-node, then ch-go support would be required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in use, clickhouse-go accepts an array of addresses, whereas ch-go only accepts a single address.

This is an internal implementation issue, it doesn't mean the end user needs to be exposed to this dichotomy.

The main difference is that clickhouse-go supports load balancing and fault tolerance, while ch-go does not.

This sounds like it should rule out ch-go for us, does it not? Connecting to a single server isn't a fault tolerant setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if the user writes only servers: ['localhost:9000'] in the configuration, then ch-go will also use it. However, if it's servers: [a:1000, b:2000, c:3000], then we randomly pick one to give to ch-go.

@zhengkezhou1 zhengkezhou1 force-pushed the feat/introduce-clickhouse-client branch from 0333ce1 to 74255e2 Compare April 30, 2025 11:40
Comment on lines 32 to 30
func NewFactory(ctx context.Context, cfg Config, poolFn newPoolFn, connFn newConnFn) (*Factory, error) {
f := &Factory{
cfg: cfg,
poolFn: poolFn,
connFn: connFn,
}
err := f.initConnections(ctx)
return f, err
}
Copy link
Contributor Author

@zhengkezhou1 zhengkezhou1 Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to test functions with the signature context.Context, cfg Config because this function establishes a real connection with the ClickHouse server through initConnections, which requires us to provide a real server environment. The initConnections function plays a crucial role in this process. It relies on accurate context and configuration information to connect to the server, and providing such a real - world environment for testing is challenging.
I've looked into the testing methods of other libraries. For example, clickhouse-go uses TestContainers to provide a real testing environment, and ch-go uses a real ClickHouse server in its E2E tests. However, implementing these methods in our testing scenario also presents certain difficulties.
Due to the above - mentioned testing challenges, it's not suitable to design the function signature as context.Context, cfg Config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can take inspiration from Elasticsearch. We can create client abstractions which can be used to mock the response from clients for code coverage in unit tests. The behaviour with real click-house server can be tested in E2E Tests by following the same steps as that of other storage. Regarding signature, something similar is happening in ES getClient() function, similar approach can be employed here to remove poolFn and connFn form the function parameters. I hope this will help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhengkezhou1 con't do connections in constructor, constructor may only validate parameters, but better when it doesn't return errors at all. Have a Start function that actually tries to create connection. It can do that via functions captured in the struct by the constructor, which allows you to swap those functions in the unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Manik2708 says, using an abstraction client makes testing easy. I think bringing back the wrapper is a good choice.

}

// NewTraceReader creates a TraceReader instance, using the connection to get traces from ClickHouse.
func NewTraceReader(conn client.Conn) *TraceReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constructor should be the first function in the file, after struct definition

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read more about chpool driver and I think we need to stay away from it, only use clickhouse-go. The fact that chpool does not accept multiple addresses is just a big bottleneck, the write path is the most intensive for traces but the driver provides no load balancing.

Comment on lines 31 to 38
Username string `mapstructure:"username"`
Password string `mapstructure:"password"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the configuration to be extensible:

extensions:
  jaeger_storage:
    clickhouse:
      servers: [...]
      auth:
        basic:
          username:
          password:

@zhengkezhou1
Copy link
Contributor Author

OK, I will drop it in another PR.

@zhengkezhou1 zhengkezhou1 force-pushed the feat/introduce-clickhouse-client branch from 65a2616 to 24dddb0 Compare May 8, 2025 16:30
@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented May 18, 2025

Hey @zhengkezhou1!

I'm jumping in to help drive the effort to bring ClickHouse support to Jaeger. Thanks so much for all your contributions so far!

I had a chance to review the open PRs, and I think it would help to take a more incremental approach so we can keep the momentum going and make the review process smoother. To that end, I’ve scoped out the overall effort and broken it down into smaller sub-issues under the main tracking issue: #5058.

To get started, it probably makes the most sense to begin at the foundation: implementing the Reader and Writer. Once we have those in place and in good shape, we can move on to the higher-level work like the factory, configuration, and documentation.

If you're still interested in contributing, feel free to let me know which sub-issues you'd like to tackle. I’d be happy to coordinate and split up the work so we can move efficiently and avoid overlap. Maybe we can start by splitting the work for the reader and writer?

Looking forward to collaborating!

@yurishkuro
Copy link
Member

Cc @zhengkezhou1

@zhengkezhou1
Copy link
Contributor Author

Thanks, I would be happy to collaborate with you. Let me work on the writer implementation ticket.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time.

@github-actions github-actions bot added the stale The issue/PR has become stale and may be auto-closed label Jul 28, 2025
"context"
)

// Conn is an s abstraction of clickhouse.Conn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has a grammatical error. It should be "Conn is an abstraction of clickhouse.Conn" (without the extra 's'). This would make the interface definition clearer.

Suggested change
// Conn is an s abstraction of clickhouse.Conn
// Conn is an abstraction of clickhouse.Conn

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

conn client.Conn
}

// NewTraceWriter creates a TraceWriter instance, using connection poo to write traces to ClickHouse.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the comment for NewTraceWriter(). It should read "using connection pool" instead of "using connection poo".

Suggested change
// NewTraceWriter creates a TraceWriter instance, using connection poo to write traces to ClickHouse.
// NewTraceWriter creates a TraceWriter instance, using connection pool to write traces to ClickHouse.

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

@github-actions github-actions bot removed the stale The issue/PR has become stale and may be auto-closed label Aug 4, 2025
Copy link

github-actions bot commented Oct 6, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time.

@github-actions github-actions bot added the stale The issue/PR has become stale and may be auto-closed label Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage stale The issue/PR has become stale and may be auto-closed v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants