-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetching external metrics from Prometheus #1711
Conversation
hermes-test-helper/src/main/java/pl/allegro/tech/hermes/test/helper/client/Hermes.java
Show resolved
Hide resolved
...st-helper/src/main/java/pl/allegro/tech/hermes/test/helper/endpoint/HermesAPIOperations.java
Show resolved
Hide resolved
...l/allegro/tech/hermes/management/infrastructure/prometheus/RestTemplatePrometheusClient.java
Show resolved
Hide resolved
@JsonProperty("data") Data data) { | ||
|
||
boolean isSuccess() { | ||
return status.equals("success") && data.isVector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It seems that we're only interested in vector metrics, right?
} | ||
|
||
@JsonIgnoreProperties(ignoreUnknown = true) | ||
record Result( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up suggestion: If we are interested only in vector metrics, then maybe we can change this record name to VectorResult
? Accordingly, values
(L26) could be renamed to vector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we are only interested in vectors, good idea to change its name.
.../src/integration/java/pl/allegro/tech/hermes/integration/setup/HermesManagementInstance.java
Show resolved
Hide resolved
...ech/hermes/integration/management/ListUnhealthySubscriptionsForOwnerBasedOnGraphiteTest.java
Show resolved
Hide resolved
...ech/hermes/integration/management/ListUnhealthySubscriptionsForOwnerBasedOnGraphiteTest.java
Outdated
Show resolved
Hide resolved
...ech/hermes/integration/management/ListUnhealthySubscriptionsForOwnerBasedOnGraphiteTest.java
Show resolved
Hide resolved
this.prometheusMetricsCache = CacheBuilder.newBuilder() | ||
.ticker(ticker) | ||
.expireAfterWrite(cacheTtlInSeconds, SECONDS) | ||
.maximumSize(cacheSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potential issue: Default cache size is 100 000. Isn't it too much (for Prometheus metrics)?
Prometheus cached metrics take more memory compared to Graphite cached metrics.
Graphite cached metric:
public class MetricDecimalValue {
private final boolean available;
private final String value;
...
}
Prometheus cached metric:
public class MonitoringMetricsContainer {
private final Map<String, MetricDecimalValue> metrics; // <- A Map - it can store multiple elements per one cache entry
private final boolean isAvailable;
...
}
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference should not be huge, because:
- CachingGraphiteClient fetches multiple metrics by
readMetrics(String... multipleMetricPaths)
and dispatches them into multiple enitres:
cache[singleMetricPath1] = MetricDecimalValue(...)
cache[singleMetricPath2] = MetricDecimalValue(...)
- CachingPromertheusClient fetches multiple metrics with one query
readMetrics(String query)
and put them into single element
cache[query] = Map<String, MetricsDecimalValue(...)>
The cache hit will be the same in both cases despite the fact that CachingGraphiteClient stores metrics more granularly: because we always query in the context of single subscription/topic so no metrics paths are shared between different queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now we don't have any cache metrics. but we should probably add a few in the future.
question: One more thing - what about the documentation? :) Would you like to approach it in separate PR, or you can add docs about the Prometheus metrics in this PR? |
import java.util.stream.Collectors; | ||
import java.util.stream.Stream; | ||
|
||
public class PrometheusMetricsProvider implements MonitoringSubscriptionMetricsProvider, MonitoringTopicMetricsProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: we should probably name this one VictoriaMetricsMetricsProvider
|
||
import java.util.function.Supplier; | ||
|
||
import static org.apache.commons.lang.exception.ExceptionUtils.getRootCauseMessage; | ||
import static pl.allegro.tech.hermes.common.metric.HermesMetrics.escapeDots; | ||
|
||
@Component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to be able to disable the whole monitoring but we can do that in the future
No description provided.