-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[refactor] AlarmCacheManager
refactoring processing logic
#3525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the AlarmCacheManager processing logic to support the new alarm definition mechanism by introducing a new defineId field and updating the alert handling logic. Key changes include:
- Adding a new column and migration scripts for define_id in PostgreSQL, MySQL, and H2.
- Updating the SingleAlert entity and related components (calculators, handler, tests) to pass defineId into alert cache operations.
- Refactoring AlarmCacheManager to use a Guava Table with stringified defineId as the key, along with adaptations in the alert calculation logic.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
db/migration/postgresql/V173__update_column.sql | Adds migration logic to add the define_id column if not present. |
db/migration/mysql/V173__update_column.sql | Uses a stored procedure to conditionally add the define_id column. |
db/migration/h2/V173__update_column.sql | Adds the define_id column with IF NOT EXISTS clause. |
SingleAlert.java | Adds defineId field to the SingleAlert entity. |
AlarmDefineCommonEnum.java | Introduces a new enum constant for COLLECTOR with an associated defineId. |
RealTimeAlertCalculatorMatchTest.java, PeriodicAlertCalculatorTest.java | Updates tests to verify that the new defineId is used in CacheManager operations. |
RealTimeAlertCalculator.java, PeriodicAlertCalculator.java, CollectorAlertHandler.java | Pass defineId to cache manager methods to ensure consistency in handling alerts. |
AlarmCacheManager.java | Refactors pending and firing alert maps to use a Guava Table keyed by defineId and fingerprint, with a fallback for historical alerts. |
Comments suppressed due to low confidence (2)
hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/calculate/AlarmCacheManager.java:65
- [nitpick] Consider renaming the parameter 'fingerPrint' to 'fingerprint' for consistency and clarity across the codebase.
public void putPending(Long defineId, String fingerPrint, SingleAlert alert) {
hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/calculate/AlarmCacheManager.java:61
- It would be helpful to add a comment explaining the rationale for converting defineId to a string (as rowKey) and the use of a historical key when defineId is null.
this.firingAlertMap.put(rowKey, fingerprint, singleAlert);
Hi, thanks for very thoughtful pr. Sorry I just took a quick look and haven't learn this PR in depth. I have a question is it possible if we make defineId a member of lables? Then we don't need to handle it specially, because we need to consider not only our internal alarms, but also external ones (which don't have define_id). |
HI, thanks for the reply, and in response to a couple of your questions
I reconsidered, if I take into account external alerts, i.e. |
@tomsun28 hi, I reorganized the query logic for
|
Hi since this involves basic alerts, sorry that we will review it slowly. I will review it on the weekend. |
Thank you for your PR. I noticed that there are currently three ways to define |
Of course; 1、define.getId() 2、getCustomKey(fingerprint) 3、singleAlert.getLabels().get(CommonConstants.LABEL_DEFINE_ID); |
I'm adding to that is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From implementation perspective, there are no problems.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What's changed?
In order to make the semantics of the Promql parser clearer, the original scheme of fingerprinting only by alerts is gradually not applicable to the subsequent logical processing, and subsequently, if there is no

value = null
data returned, it will not be possible to find and restore the notification through the original fingerprint information, so this refactoring scheme is as follows:On the original
firingAlertMap
andpendingAlertMap
storage scheme, extenddefineId
, after all, in the existing architecture, AlertDefine's rules are essential, even if there is a special alert notification, you can customize the alert rule ( AlarmDefineCommonEnum) to realize it.If an alarm/recovery occurs for a single indicator under a rule, it is found through its alarm definition. If all the indicators under this rule are back to normal, the whole row of data can be processed through the alarm definition, the specific process, as shown below:

Firing info

Data on recovery indicators are as follows

Recovery logic processing

Modification details:
defineId
, if not, query the historical ones.HZB_ALERT_SINGLE.define_id
field has been added and some migration script has been added.Checklist
Add or update API
1、Periodic




2、RealTime



3、Collector

