Replies: 3 comments 11 replies
-
Above is a round 1 metrics pass of Clair and ClairCore. A round 2 metrics pass will follow with component specific instrumentations such as "layer fetcher fetcher counts", "indexer counts", etc... |
Beta Was this translation helpful? Give feedback.
-
These look good, with the proviso that these
👍
I'll defer to you on these. They seem like generally useful operation stats.
All of these seem more like tracing concerns rather than metrics concern.
I can see a better argument for exposing timing around database timings, though. I think the operator use case is for bisecting "slowness" between the application and database, and helping us with query hotspots. |
Beta Was this translation helpful? Give feedback.
-
just to confirm, any timing data we would have gotten by instrumenting function durations, can be obtained on a live instance via pprof? If thats the case ill nix the "function" instrumentations from the spec. Just want to ensure we will be able to see function call timing during run time. |
Beta Was this translation helpful? Give feedback.
-
Clair V4 Metrics Specification
This specification outlines how Clair V4 must implement metrics.
Metric collection is assumed to be driven by Prometheus and thus influences some key behaviors:
Clair and ClairCore
Clair implements the http layer, notifier subsystem, and authentication.
ClairCore implement the internal API consisting of business logic and data store queries.
Metric instrumentation rules for both components must be defined.
Clair
HTTP Layer
Each API endpoint Clair exposes will be associated with the following key metrics:
- Request Count
- Request Size
- Response Size
- Request Duration
A per-endpoint metric strategy is chosen to combat a multiplicative increase in time-series cardinality as future development adds endpoints to Clair.
Request Count
Metric Name Format: clair_http_{endpoint}_request_total
Metric Type: Counter
Labels:
Observables:
Request Size
Metric Name Format: clair_http_{endpoint}_request_size_bytes
// <= 250 bytes, <= 500 bytes, <= 1kb, <= 250kb, <= 500kb, <= 1mb, <= 250mb, <= 500mb
Metric Type: Histogram(-inf, 250, 500, 1000, 25e4, 5e5, 1e6, 2.5e8, 5e+8, +inf)
Unit: bytes
Observables:
Response Size
Metric Name Format: clair_http_{endpoint}_response_size_bytes
Metric Type: Histogram(-inf, 250, 500, 1000, 25e4, 5e5, 1e6, 2.5e8, 5e+8, +inf)
Unit: bytes
Observables:
Request Duration
Metric Name Format: clair_http_{endpoint}_request_duration_seconds
// <= 30ms, <= 50ms, <= 1 second, <= 5 seconds, <= 30 seconds, <= 1 minute, <= 5 minutes, <= 10 minute
Metric Type: Histogram(-inf, .030, .1, .5, 1, 5, 10, 30, 60, 300, 600, +inf)
Unit: seconds
Observables:
Authentication
Error Count
Metric Name Format: clair_auth_error_total
Metric Type: Counter
Labels:
Observables:
Notifier Subsystem
Notification Status Counts
Metric Name Format: "clair_notifier_notification_status_total"
Metric Type: Gauge
Observables:
Labels:
Notification Counts
Metric Name Format: "clair_notifier_notification_total"
Metric Type: Counter
Observables:
Labels:
Notifier Polls
Metric Name Format: "clair_notifier_poll_total"
Metric Type: Counter
Observables:
Labels:
ClairCore
General API
Function CountMetric Name Format: claircore_{component}_function_totalMetric Type: CounterObservables:Total count of calls to a componentTotal count of calls to a component's specific functionRate of a components function callsRate of a component's specific function callsLabels:func: instrumented function of the given componentFunction DurationMetric Name Format: claircore_{component}_function_duration_seconds// <= 50ns, <= 250ns, <= 500ns, <= 750ns, <=1m, <= 50ms, <= 75ms, <= 1second, <= 30 seconds, <= 60 seconds
Metric Type: Histogram(5e-8, 2.5e-7, 5e-7, 7.5e-7, 0.001, 0.05, 0.25, 0.5, 0.75, 1, 15, 30, 60)Observables:-
Average duration of a component's function callsQuantile ranking of the duration of a component's function calls.Labels:func: instrumented function of the given componentThe above section was deemed to be a tracing concern and the pprof server will be used instead
General Database
Query Count
Metric Name Format: claircore_{store}{function}total || clair{store}{function}_total
Metric Type: Counter
Observables:
Labels:
Query Duration
Metric Name Format: claircore_{store}{function}duration_seconds || clair{store}{function}_total
// prometheus default bucket size
Metric Type: Histogram(.005, .01, .025, .05, .075, .1, .25, .5, .75, 1, 2.5, 5, 7.5, 10)
Observables:
Labels:
Beta Was this translation helpful? Give feedback.
All reactions