-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stream): add dedicated metrics for sync log store #20907
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
3fd1c42
to
9315111
Compare
3f41c4e
to
0cbff0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a screenshot of the grafana dashboard applied with this PR?
0cbff0f
to
83e5903
Compare
32698a4
to
fc13ccd
Compare
fc13ccd
to
8d14284
Compare
/// `target`: refers to the target of the log store, | ||
/// for instance `MySql` Sink, PG sink, etc... | ||
/// or unaligned join. | ||
pub(crate) fn new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think metrics here should only be used by synced kv log store, and should have a fixed target? For metrics not shared with unsynced kv log store, we don't even need the target label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be used by decoupled sink into table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to specify the target in the future when we have multiple usages on synced kv log store executor? From the current stream node, we don't have field to store the target yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add it in the stream node later, when we use it for decoupled sink.
"", | ||
[ | ||
panels.target( | ||
f"sum({metric('sync_kv_log_store_state')}) by (type, fragment_id, relation)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use rate
to measure the rate of transition, instead of the counter value itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if we want to monitor the current state rather than the transition rate, we may use gauge rather than counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to store 2 counters, and subtract them from each other. If we directly use gauge, if the state transitions to dirty, and in a short instant quickly changes to clean, but only does so occasionally, it might be missed when prometheus collects metrics. This is because prometheus only collects at some fixed interval. The metric value will just stay at 0
. On the other hand, if I collect both clean and dirty, I'm able to differentiate this, since I can check the clean and dirty state transitions.
I will hide the first panel in this group, and only expose the current state metric panel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If so, I think we can replace the sum
in the first panel with rate
, and unhide it to shows the transition rate into the two states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
003ed21
to
34ad60a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share a screenshot on grafana of the latest code?
yield Message::Barrier(barrier); | ||
self.metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current implementation we have to carefully ensure that we do this measurement in every yield point. We may make this logic of measuring back-pressure to be more general and simpler.
It can be a wrapper over any inner stream. When it implement Stream
, it start a timer every time its poll_next
returns ready, and record the time elpased between its next call on poll_next
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tracing::trace!("resuming paused future"); | ||
if let Some(sleep_future) = sleep_future { | ||
let deadline = sleep_future.deadline(); | ||
let now = Instant::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check here seems unnecessary. Neither the warning log nor the trace log seems to provide notable information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
"", | ||
[ | ||
panels.target( | ||
f"sum({metric('sync_kv_log_store_state')}) by (type, fragment_id, relation)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If so, I think we can replace the sum
in the first panel with rate
, and unhide it to shows the transition rate into the two states.
16c7e1e
to
96449c2
Compare
7152be4
to
7a074cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM.
@@ -113,12 +113,12 @@ pub mod metrics { | |||
// state of the log store | |||
pub unclean_state: LabelGuardedIntCounter<5>, | |||
pub clean_state: LabelGuardedIntCounter<5>, | |||
pub wait_next_poll_ns: LabelGuardedIntCounter<4>, | |||
pub wait_next_poll_ns: Option<LabelGuardedIntCounter<4>>, // Allow us to take it later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can clone it for simplicity when we use it.
@@ -427,10 +430,10 @@ impl<S: LocalStateStore> WriteFuture<S> { | |||
stream: BoxedMessageStream, | |||
write_state: LogStoreWriteState<S>, | |||
) -> Self { | |||
let instant = Instant::now() + duration; | |||
tracing::trace!(?instant, ?duration, "write_future_pause"); | |||
tracing::trace!(now = ?Instant::now(), ?duration, "write_future_pause"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant to call Instant::now()
in consecutively 3 lines. Can call once and reuse later.
/// `target`: refers to the target of the log store, | ||
/// for instance `MySql` Sink, PG sink, etc... | ||
/// or unaligned join. | ||
pub(crate) fn new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to specify the target in the future when we have multiple usages on synced kv log store executor? From the current stream node, we don't have field to store the target yet.
Co-authored-by: William Wen <[email protected]>
c064c27
to
364fba8
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Checklist
Documentation
Release note