[BUG] {{period_start}} and {{period_end}} seems not respected #389

jngbng · 2021-08-18T03:12:00Z

Describe the bug
To receive error log messages, I created an alert using below query and run it on every minutes as mentioned in #187.

Extraction query:

{
    "size": 10,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "@timestamp": {
                            "from": "{{period_start}}||-1m",
                            "to": "{{period_end}}",
                            "include_lower": true,
                            "include_upper": true,
                            "format": "epoch_millis",
                            "boost": 1
                        }
                    }
                },
                {
                    "match": {
                        "log": {
                            "query": "error",
                            "operator": "OR",
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "fuzzy_transpositions": true,
                            "lenient": false,
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": true,
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {}
}

Trigger message template:

Period: {{ctx.periodStart}} - {{ctx.periodEnd}}

{{#ctx.results}}
Total hits: {{hits.total.value}} ({{hits.hits.length}}{{hits.hits.size}} shown)

{{#hits.hits}}
{{#_source.log_processed}}
{{_source.@timestamp}} [{{level}}] {{logger}} - {{message}}
{{/_source.log_processed}}
{{/hits.hits}}
{{/ctx.results}}

Question1:

It seems work well, but I got two consecutive emails for one error log.

At 2021-08-17T18:52:

Period: 2021-08-17T18:51:30.629Z - 2021-08-17T18:52:30.629Z

Total hits: 1 (1 shown)

2021-08-17T18:52:22.447Z [ERROR] blah blah

And at 2021-08-17T18:53:

Period: 2021-08-17T18:52:30.629Z - 2021-08-17T18:53:30.629Z

Total hits: 1 (1 shown)

2021-08-17T18:52:22.447Z [ERROR] blah blah

The second email is weird as the matched result is out of time range (2021-08-17T18:52:30.629Z - 2021-08-17T18:53:30.629Z) and is already captured by the first email.

Question2:

As far as I know, {{period_end}} is set to the time when the monitor runs and {{period_start}} is evaluated by{{period_end}} minus interval.

Is it possible that an event is not captured by neither or two consecutive monitor runs?

For example, one monitor runs with period (2021-08-17T18:51:30.629Z - 2021-08-17T18:52:30.629Z)
and the next period runs with period (2021-08-17T18:52:30.929Z - 2021-08-17T18:53:30.929Z)
so that an event occurred at 2021-08-17T18:52:30.800Z is not captured by neither of monitors.
Can this scenario happen?

Other plugins installed
No.

To Reproduce
As explained in Bug section.

Expected behavior
Each monitor trigger's time range should not overlap.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: macOS 11.2.3
Browser: chrome
Version: 91.0.4472.114

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

jngbng · 2021-09-03T01:55:42Z

Edit on 2021/09/03:

I found that some events are not captured by monitor at all. I seems that current behavior might work like below.

Trigger time	period variable (`period_start/end`)	actual query period
09-03 00:01	09-03 00:00 - 00:01	09-03 00:00 - 00:01
09-03 00:02	09-03 00:01 - 00:02	09-03 00:00 - 00:01 <- causes trigger miss or duplicate trigger
09-03 00:03	09-03 00:02 - 00:03	09-03 00:02 - 00:03

Detailed situation:

Whenever network error (such as time out) happens in my app server, both [Error log] and [Access log] are printed.

Elastic data:

2021-09-03 10:30:15 [Error log] Actual error message
2021-09-03 10:30:15 [Access log] Web server access log whose status-code is 500

Two Monitors are set:

Monitor for error log
Monitor for 5xx status access log

Actual trigger:
For network error events, both monitors should trigger. But sometimes monitor 2 is not triggered whereas monitor 1 does. :(
The query setting seems right because from Edit Monitor page, I replace 'period_start' and 'period_end' with missed period and run, then log data are corrected read.

jngbng added the bug Something isn't working label Aug 18, 2021

jngbng assigned skkosuri-amzn Aug 18, 2021

jngbng changed the title ~~[BUG] question on {{period_start}} and {{period_end}}~~ [BUG] {{period_start}} and {{period_end}} seems not respected Sep 3, 2021

jmeher2020 mentioned this issue Aug 2, 2024

Allow specifying period size seperately from interval. opensearch-project/alerting#67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] {{period_start}} and {{period_end}} seems not respected #389

[BUG] {{period_start}} and {{period_end}} seems not respected #389

jngbng commented Aug 18, 2021 •

edited

Loading

jngbng commented Sep 3, 2021

[BUG] {{period_start}} and {{period_end}} seems not respected #389

[BUG] {{period_start}} and {{period_end}} seems not respected #389

Comments

jngbng commented Aug 18, 2021 • edited Loading

jngbng commented Sep 3, 2021

Detailed situation:

jngbng commented Aug 18, 2021 •

edited

Loading