Skip to content

Detect if informers are lagging behind the main API server too much #34

Open
@luxas

Description

@luxas

Category

Internal refactors/changes

Describe the feature you'd like to request

It is possible for informers to be lag behind the API server/etcd state a lot, I've heard of informer lag of 5mins being spotted in the wild. We should try to detect this somehow (not necessarily straightforward without watching the every API object, like the API server does, and uses for consistent list from cache)

It would be useful if the k8s API server exposed some endpoint to query for the most up-to-date RV seen for a given groupkind

The easiest way to do this would probably be to ask for bookmark events, and restart (or similar), if the server hasn't received a bookmark in some (configurable) interval. I don't remember now of the top of my head how often the API server usually sends out bookmark events, but their delivery is also not guaranteed (which might indicate we have a stale watch):

you shouldn't assume bookmarks are returned at any specific interval, nor can clients assume that the API server will send any BOOKMARK event even when requested.

https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks

Describe alternatives you've considered

Additional context

No response

Is this something that you'd be interested in working on?

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions