Skip to content
This repository was archived by the owner on Aug 2, 2022. It is now read-only.
This repository was archived by the owner on Aug 2, 2022. It is now read-only.

[BUG] Search on roll up indices with avg is not working when we have empty buckets  #451

@narayananaidup

Description

@narayananaidup

Describe the bug

Search on roll up indices with avg is not working when we have empty buckets .

For example if we are rolling up the data on the roll up index with a 60 minute granularity and if we are searching on roll up indices with a 5 minute granularity we might face the following error ( which is not a idle customer use case though) but we need to handle when we have empty buckets ..

Roll up policy which we tried ::

curl -XPUT "localhost:9200/_opendistro/_rollup/jobs/latest_stats_roll_up" -H 'Content-Type: application/json' -d'{"rollup":{"enabled":true,"schedule":{"interval":{"period":1,"unit":"Minutes"}},"description":"An example policy that rolls up the sample ecommerce data","source_index":"fmstats_2021-06-03*","target_index":"temp_stats_roll_1","page_size":1000,"delay":0,"continuous":false,"dimensions":[{"date_histogram":{"source_field":"timestamp","fixed_interval":"60m"}},{"terms":{"source_field":"portIdToClusterId"}},{"terms":{"source_field":"alias"}}],"metrics":[{"source_field":"port.rx.packets","metrics":[{"avg":{}},{"sum":{}},{"max":{}},{"min":{}},{"value_count":{}}]}]}}'

Search query which we tried :

curl -XPOST "localhost:9200/temp_stats_roll_up/_search?pretty" -H 'Content-Type: application/json' -d'{"size":0,"aggregations":{"daily_numbers":{"terms":{"field":"portIdToClusterId"},"aggregations":{"Sub_dateHistogramAgg":{"date_histogram":{"field":"timestamp","missing":0,"fixed_interval":"5m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0},"aggregations":{"avgAggportTxPackets":{"avg":{"field":"port.tx.packets"}}}}}}}}'

problem here is fixed_interval":"5m which should be greater than roll indexing time dimensions.

adding the following code and it started working , can you please validate the same and fix accordingly .

git a/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt b/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt
index f3d4b75..937106e 100644
--- a/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt
+++ b/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt
@@ -254,7 +254,7 @@ fun Rollup.rewriteAggregationBuilder(aggregationBuilder: AggregationBuilder): Ag
                     .combineScript(Script(ScriptType.INLINE, Script.DEFAULT_SCRIPT_LANG,
                             "def d = new long[2]; d[0] = state.sums; d[1] = state.counts; return d", emptyMap()))
                     .reduceScript(Script(ScriptType.INLINE, Script.DEFAULT_SCRIPT_LANG,
-                            "double sum = 0; double count = 0; for (a in states) { sum += a[0]; count += a[1]; } return sum/count", emptyMap()))
+                            "double sum = 0; double count = 0; for (a in states) { if(a!=null && a.length > 0) sum += a[0]; if(a!=null && a.length > 0) count += a[1]; } return count!=0?sum/count:0", emptyMap()))
         }
         is MaxAggregationBuilder -> {
             MaxAggregationBuilder(aggregationBuilder.name)

for avg case :

[root@fmha1 opendistro-index-management]#  curl -XPOST "localhost:9200/temp*/_search?pretty" -H 'Content-Type: application/json' -d'{"size":0,"aggregations":{"daily_numbers":{"terms":{"field":"portIdToClusterId"},"aggregations":{"Sub_dateHistogramAgg":{"date_histogram":{"field":"timestamp","missing":0,"fixed_interval":"5m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0},"aggregations":{"maxAggportRxPackets":{"avg":{"field":"port.rx.packets"}}}}}}}}'
{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "",
    "phase" : "fetch",
    "grouped" : true,
    "failed_shards" : [ ],
    "caused_by" : {
      "type" : "script_exception",
      "reason" : "runtime error",
      "script_stack" : [
        "sum += a[0]; ",
        "^---- HERE"
      ],
      "script" : "double sum = 0; double count = 0; for (a in states) { sum += a[0]; count += a[1]; } return sum/count",
      "lang" : "painless",
      "position" : {
        "offset" : 54,
        "start" : 54,
        "end" : 67
      },
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  },
  "status" : 400
}

Expected behavior

when we have empty buckets in the data histo we need to ignore them and return some value 0 or NA.

OpenDistro version : 1.13.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions