Skip to content

[Change Proposal] Allow for distributing data views via integration packages #998

@Alphayeeeet

Description

@Alphayeeeet

Summary
In large clusters with many metric/log data streams, queries can become slow because they often scan indices that are unrelated to the active integration. Integrations should be able to ship managed Data Views (formerly index patterns) scoped to their datasets so visualizations and searches use those by default, reducing the set of queried indices and improving performance.

Motivation

  • Many integrations (e.g. Kubernetes) ship dashboards and visualizations that currently query broad index patterns such as metrics-*.
  • In a cluster with dozens of other integrations, queries against metrics-* will cause Elasticsearch to consider many unrelated indices. Even when using data_stream.dataset filtering (a constant_keyword) the engine must still visit shards to rule out matches, so a high index/shard count leads to expensive queries.
  • If an integration can provide a managed Data View scoped to its own dataset names (for example metrics-kubernetes*), visualizations and saved searches could target that narrower Data View by default. This will limit queries to only the relevant data streams/indices and reduce query cost and latency.

Proposal

  • Allow Fleet/Integration packages to include managed Data Views under the package's kibana/saved_objects assets (or a dedicated data_views/ package location).
  • When a package is installed, Kibana should register those Data Views as package-managed and make them available to the package’s dashboards, visualizations and saved searches.
  • Visualizations and dashboards shipped with the package should reference the package-provided Data View by default.
  • Provide clear semantics for package-managed Data Views:
    • They should be identifiable as package-managed so users understand edit restrictions.
    • Packages should be able to upgrade/replace these Data Views during package upgrades.

Example

  • Kubernetes integration ships dashboards that currently query metrics-*.
  • The package provides a managed Data View metrics-kubernetes* that only matches Kubernetes metric data streams.
  • After installation, package visualizations use metrics-kubernetes* so searches only touch indices belonging to Kubernetes metrics.

Benefits

  • Reduced query fan-out across unrelated indices and shards.
  • Lower query latency and resource usage in clusters with high index counts.

Open questions / discussion points

  • Best package location/format for bundling Data Views (existing kibana/saved_objects vs. new package folder).
  • Migration/upgrade semantics when a package changes its Data View (aliases vs replacing objects).
  • How to present managed Data Views in the UI.
  • Edge cases: multi-dataset packages, packages that must span multiple index name patterns, and cross-package references. Need for multiple Data Views per package?

Implementation notes (suggested)

  • Reuse the existing saved object format for Data Views and register them as package-managed on install.
  • Ensure saved visualizations/dashboards reference the Data View saved object by id, not by pattern text.
  • Add validation tooling in elastic-package to help authors create correct Data Views and references.

Please feel free to discuss and correct me, if I got anything wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssue needs discussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions