Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-34558][state] Support tracking state size #25837

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hejufang
Copy link

@hejufang hejufang commented Dec 22, 2024

What is the purpose of the change

Support tracking state key size and value size and reporting related metrics by metrics reporter.

Brief change log

Add key /value size statistics when accessing state. 
Add configurations to control whether this feature is enabled (disabled by default) and configure the sampling rate.

Verifying this change

This change added tests and can be verified as follows:
MetricsTrackingValueStateTest
MetricsTrackingMapStateTest
MetricsTrackingListStateTest
MetricsTrackingReducingStateTest
MetricsTrackingAggregatingStateTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 22, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@Zakelly Zakelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! In general I have a question: why serializing the key/value by sampling, instead of read the serialization result/length of current put/add, especially for RocksDB? WDYT @masteryhx


@Documentation.Section(Documentation.Sections.STATE_SIZE_TRACKING)
public static final ConfigOption<Boolean> SIZE_TRACK_ENABLED =
ConfigOptions.key("state.backend.size-track.keyed-state-enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use state.size-track as prefix.

@@ -36,16 +38,52 @@
* @param <S> Type of the internal kv state
* @param <LSM> Type of the latency tracking state metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the description accordingly.

throws Exception {
if (latencyTrackingStateConfig.isEnabled()) {
if (latencyTrackingStateConfig.isEnabled() || sizeTrackingStateConfig.isEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the LatencyTrackingStateFactory is still used, why this is not renamed?

@hejufang
Copy link
Author

hejufang commented Jan 6, 2025

Thanks for the PR! In general I have a question: why serializing the key/value by sampling, instead of read the serialization result/length of current put/add, especially for RocksDB? WDYT @masteryhx

@Zakelly In my opinion, serializing the key/value by sampling has the following reasons:

  1. Unlike state access latency, the size of key/value cannot be simply obtained in the outer MetricsTrackingState. For example, in RocksDBValueState#value valueBytes.length can only be accessed within the method
  2. Serializing the key/value by sampling can also collect statistics on other types of statebackend, not only for RocksDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants