-
Notifications
You must be signed in to change notification settings - Fork 835
Metrics: Record incoming request sizes #4495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the metric for any other endpoints. Auction, amp and video endpoints are sufficient.
metrics/prometheus/prometheus.go
Outdated
@@ -168,6 +170,7 @@ func NewMetrics(cfg config.PrometheusMetrics, disabledMetrics config.DisabledMet | |||
priceBuckets := []float64{250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000} | |||
queuedRequestTimeBuckets := []float64{0, 1, 5, 30, 60, 120, 180, 240, 300} | |||
overheadTimeBuckets := []float64{0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} | |||
requestSizeBuckets := []float64{1000, 5000, 10000, 15000, 20000, 40000, 75000, 100000} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these buckets chosen? Are they based off of any real requests? Have we seen at least one request that falls into each of these buckets or at least most of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is difficult to say... My test request that invokes a lot of request-related features (Stored bid responses, FPD, multi-type imp, category mapping, etc) takes ~4500 bytes.
Should we first see how production data will look like and adjust buckets accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these buckets chosen?
Arbitrarily. These have to be discussed and carefully chosen first.
metrics/prometheus/prometheus.go
Outdated
endpoint := metrics.GetEndpointFromRequestType(labels.RType) | ||
m.requestsSize.With(prometheus.Labels{ | ||
requestEndpointLabel: string(endpoint), | ||
}).Observe(float64(labels.RequestSize)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR with AMP endpoint and found that if stored request was not found and request Len was never measured, we still write 0 bytes to metrics.
It falls into the very first basket and looks like this in result:
# TYPE pbs_sub_request_size_bytes histogram
pbs_sub_request_size_bytes_bucket{request_size="amp",le="1000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="5000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="10000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="15000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="20000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="40000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="75000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="100000"} 1
pbs_sub_request_size_bytes_bucket{request_size="amp",le="+Inf"} 1
It can be misinterpreted that request had less than 1000 bytes, where in fact that was an error.
Should we add the check for the labels.RequestStatus
? Only write metric if labels.RequestStatus == metrics.RequestStatusOK
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR with AMP endpoint and found that if stored request was not found and request Len was never measured, we still write 0 bytes to metrics.
Really? Let me go in there with the debugger. Do you mind sharing the amp request you used for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the AMP endpoint need to be updated to set this label in loadRequestJSONForAmp
? I don't think we are recording the size on this endpoint at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the AMP endpoint need to be updated
Yes. Corrected
metrics/prometheus/prometheus.go
Outdated
@@ -168,6 +170,7 @@ func NewMetrics(cfg config.PrometheusMetrics, disabledMetrics config.DisabledMet | |||
priceBuckets := []float64{250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000} | |||
queuedRequestTimeBuckets := []float64{0, 1, 5, 30, 60, 120, 180, 240, 300} | |||
overheadTimeBuckets := []float64{0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} | |||
requestSizeBuckets := []float64{600, 700, 800, 900, 1000, 1500, 3000, 5000, 6000, 7500, 10000, 15000} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bucket range allows for a max request size of ~15kb. I think that's too small. I've seen very large requests in the wild, especially when privacy signals are involved. Everyone's mileage will vary, but how about we start with 100kb as a max size?
Something like...
requestSizeBuckets := []float64{100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 75000, 100000}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline.
requestSizeBuckets := []float64{100, 500, 750, 1000, 2000, 4000, 7000, 10000, 15000, 20000, 50000, 75000}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected. New buckets:
pbs_sub_request_queue_time_sum{request_status="requestRejectedLabel",request_type="video"} 0
pbs_sub_request_queue_time_count{request_status="requestRejectedLabel",request_type="video"} 0
+ # HELP pbs_sub_request_size_bytes Count that keeps track of incoming request size in bytes labeled by endpoint.
+ # TYPE pbs_sub_request_size_bytes histogram
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="100"} 0
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="500"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="750"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="1000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="2000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="4000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="7000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="10000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="15000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="20000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="50000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="75000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="auction",le="+Inf"} 1
+ pbs_sub_request_size_bytes_sum{request_size="auction"} 217
+ pbs_sub_request_size_bytes_count{request_size="auction"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="100"} 0
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="500"} 0
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="750"} 0
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="1000"} 0
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="2000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="4000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="7000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="10000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="15000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="20000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="50000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="75000"} 1
+ pbs_sub_request_size_bytes_bucket{request_size="video",le="+Inf"} 1
+ pbs_sub_request_size_bytes_sum{request_size="video"} 1790
+ pbs_sub_request_size_bytes_count{request_size="video"} 1
# HELP pbs_sub_request_time_seconds Seconds to resolve successful Prebid Server requests labeled by type.
# TYPE pbs_sub_request_time_seconds histogram
pbs_sub_request_time_seconds_bucket{request_type="amp",le="0.05"} 0
@@ -442,6 +442,7 @@ func (deps *endpointDeps) parseRequest(httpRequest *http.Request, labels *metric | |||
errs = []error{err} | |||
return | |||
} | |||
labels.RequestSize = len(requestJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are capturing the size of the request that comes in over the wire prior to merging in stored requests for both the auction and video endpoints while the amp endpoint captures the request size after stored requests are merged. IMO this is a bit confusing. Do we even need the amp request size since it is just based on stored requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to not log these metrics oh the amp
endpoint
@@ -169,6 +171,7 @@ func NewMetrics(cfg config.PrometheusMetrics, disabledMetrics config.DisabledMet | |||
priceBuckets := []float64{250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000} | |||
queuedRequestTimeBuckets := []float64{0, 1, 5, 30, 60, 120, 180, 240, 300} | |||
overheadTimeBuckets := []float64{0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} | |||
requestSizeBuckets := []float64{100, 500, 750, 1000, 2000, 4000, 7000, 10000, 15000, 20000, 50000, 75000} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to pretty hardcoded sizes
I would use max_request_size / 10 buckets
https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#LinearBuckets
Adds a metric to keep track of the size in bytes of requests that hit the
/auction
,/amp
and/video
endpoints. Should we add thecookie_sync
endpoint?Prometheus
To anyone who reviews, please let me know if you agree with the buckets (1000, 5000, 10000, 15000, 20000, 40000, 75000, and 100000 bytes respectively)