Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

[BUG] {{period_start}} and {{period_end}} seems not respected #389

Open
jngbng opened this issue Aug 18, 2021 · 1 comment
Open

[BUG] {{period_start}} and {{period_end}} seems not respected #389

jngbng opened this issue Aug 18, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@jngbng
Copy link

jngbng commented Aug 18, 2021

Describe the bug
To receive error log messages, I created an alert using below query and run it on every minutes as mentioned in #187.

image

Extraction query:

{
    "size": 10,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "@timestamp": {
                            "from": "{{period_start}}||-1m",
                            "to": "{{period_end}}",
                            "include_lower": true,
                            "include_upper": true,
                            "format": "epoch_millis",
                            "boost": 1
                        }
                    }
                },
                {
                    "match": {
                        "log": {
                            "query": "error",
                            "operator": "OR",
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "fuzzy_transpositions": true,
                            "lenient": false,
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": true,
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {}
}

Trigger message template:

Period: {{ctx.periodStart}} - {{ctx.periodEnd}}

{{#ctx.results}}
Total hits: {{hits.total.value}} ({{hits.hits.length}}{{hits.hits.size}} shown)

{{#hits.hits}}
{{#_source.log_processed}}
{{_source.@timestamp}} [{{level}}] {{logger}} - {{message}}
{{/_source.log_processed}}
{{/hits.hits}}
{{/ctx.results}}

Question1:

It seems work well, but I got two consecutive emails for one error log.

At 2021-08-17T18:52:

Period: 2021-08-17T18:51:30.629Z - 2021-08-17T18:52:30.629Z

Total hits: 1 (1 shown)

2021-08-17T18:52:22.447Z [ERROR] blah blah

And at 2021-08-17T18:53:

Period: 2021-08-17T18:52:30.629Z - 2021-08-17T18:53:30.629Z

Total hits: 1 (1 shown)

2021-08-17T18:52:22.447Z [ERROR] blah blah

The second email is weird as the matched result is out of time range (2021-08-17T18:52:30.629Z - 2021-08-17T18:53:30.629Z) and is already captured by the first email.

Question2:

As far as I know, {{period_end}} is set to the time when the monitor runs and {{period_start}} is evaluated by{{period_end}} minus interval.

Is it possible that an event is not captured by neither or two consecutive monitor runs?

For example, one monitor runs with period (2021-08-17T18:51:30.629Z - 2021-08-17T18:52:30.629Z)
and the next period runs with period (2021-08-17T18:52:30.929Z - 2021-08-17T18:53:30.929Z)
so that an event occurred at 2021-08-17T18:52:30.800Z is not captured by neither of monitors.
Can this scenario happen?

Other plugins installed
No.

To Reproduce
As explained in Bug section.

Expected behavior
Each monitor trigger's time range should not overlap.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: macOS 11.2.3
  • Browser: chrome
  • Version: 91.0.4472.114

Additional context
Add any other context about the problem here.

@jngbng jngbng added the bug Something isn't working label Aug 18, 2021
@jngbng jngbng changed the title [BUG] question on {{period_start}} and {{period_end}} [BUG] {{period_start}} and {{period_end}} seems not respected Sep 3, 2021
@jngbng
Copy link
Author

jngbng commented Sep 3, 2021

Edit on 2021/09/03:

I found that some events are not captured by monitor at all. I seems that current behavior might work like below.

Trigger time period variable (period_start/end) actual query period
09-03 00:01 09-03 00:00 - 00:01 09-03 00:00 - 00:01
09-03 00:02 09-03 00:01 - 00:02 09-03 00:00 - 00:01 <- causes trigger miss or duplicate trigger
09-03 00:03 09-03 00:02 - 00:03 09-03 00:02 - 00:03

Detailed situation:

Whenever network error (such as time out) happens in my app server, both [Error log] and [Access log] are printed.

Elastic data:

2021-09-03 10:30:15 [Error log] Actual error message
2021-09-03 10:30:15 [Access log] Web server access log whose status-code is 500

Two Monitors are set:

  1. Monitor for error log
  2. Monitor for 5xx status access log

Actual trigger:
For network error events, both monitors should trigger. But sometimes monitor 2 is not triggered whereas monitor 1 does. :(
The query setting seems right because from Edit Monitor page, I replace 'period_start' and 'period_end' with missed period and run, then log data are corrected read.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants