Skip to content

Start big shards indexing with delay and load distribution #5811

@sergey-safarov

Description

@sergey-safarov

Summary

I have a large shard which needs to be indexed using a new view. When the indexing operation started, the same shard was indexed on several nodes.

Image

This can be optimised using this logic.

  1. If the shard has X changes, then the required delay is indexing for Y milliseconds. The delay may be larger on a node with a high CPU load and smaller on a node with less CPU load;
  2. check the other cluster node started indexing for the same shard;
  3. if the shard is already indexed on another node, then skip this shard for indexing now.

The same logic applied to all shards. At the end, all nodes are randomly distributed shards for indexing.
On the next view request, the same shard will be indexed on the other node.

Desired Behaviour

Nodes distribute the shards indexing task and do not index the same shard on several nodes at the same time.

Additional context

This allows for distributing CPU load on nodes and makes CPU spikes smoother.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions