-
Notifications
You must be signed in to change notification settings - Fork 100
[router]Integrate router request_size, response_size and key_size metrics to otel #1939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[router]Integrate router request_size, response_size and key_size metrics to otel #1939
Conversation
@@ -101,7 +103,8 @@ public class RouterHttpRequestStats extends AbstractVeniceHttpStats { | |||
private final Sensor badRequestKeyCountSensor; | |||
|
|||
/** OTel metrics yet to be added */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Move this comment to be below the requestSizeMetric metric.
CALL_SIZE.getMetricEntity(), | ||
otelRepository, | ||
this::registerSensorFinal, | ||
RouterTehutiMetricNameEnum.REQUEST_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will add RESPONSE_SIZE
to this PR as well right?
otelRepository, | ||
this::registerSensorFinal, | ||
RouterTehutiMetricNameEnum.REQUEST_SIZE, | ||
singletonList(new Avg()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original requestSizeSensor
was not just Avg()
it had TehutiUtils.getPercentileStat(getName(), getFullMetricName("request_size")), new Avg()
@@ -179,6 +184,8 @@ public RouterHttpRequestStats( | |||
Rate requestRate = new OccurrenceRate(); | |||
Rate healthyRequestRate = new OccurrenceRate(); | |||
Rate tardyRequestRate = new OccurrenceRate(); | |||
SampledStat requestSize = new Avg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is requestSize
used somewhere else? If not, I think we can remove the local variable.
* Size of request and response in bytes | ||
*/ | ||
CALL_SIZE( | ||
MetricType.HISTOGRAM, MetricUnit.NUMBER, "Size of request and response in bytes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-nagarajan , probably a question for you, do we have to define a new unit for bytes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added new unit for BYTES
assertEquals( | ||
metricsRepository.getMetric("." + storeName + "--" + REQUEST_SIZE.getMetricName() + ".Avg").value(), | ||
512.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this only tests the tehuti metrics part, can we also add a test to verify the correctness of the OTEL metrics?
…sts are passing now
if (isKeyValueProfilingEnabled) { | ||
if (storeName.equals(STORE_NAME_FOR_TOTAL_STAT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow why we changed the condition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition was updated to handle metric creation correctly based on the type of profiling enabled (isKeyValueProfilingEnabled) and the specific store being monitored (storeName).
When isKeyValueProfilingEnabled is true:
If storeName.equals(STORE_NAME_FOR_TOTAL_STAT), we only want to emit Tehuti metrics (not OTEL). So, we create the metrics the same way, but pass null for OTEL to ensure only Tehuti sensors are registered.
For other stores (i.e., storeName != STORE_NAME_FOR_TOTAL_STAT), we only want to emit OTEL metrics (not Tehuti), so we skip the Tehuti sensor setup.
When isKeyValueProfilingEnabled is false:
We fallback to the default behavior: emit OTEL metrics for response size only (no key size sensors).
Problem Statement
Add router request_size, response_size and key_size metrics in OTel
Solution
This change add
KeyValueProfilingEnabled
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized
,RWLock
) are used where needed.ConcurrentHashMap
,CopyOnWriteArrayList
).How was this PR tested?
Does this PR introduce any user-facing or breaking changes?