Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache use discussion #5094

Open
darkowlzz opened this issue Dec 2, 2024 · 9 comments
Open

Cache use discussion #5094

darkowlzz opened this issue Dec 2, 2024 · 9 comments

Comments

@darkowlzz
Copy link
Contributor

darkowlzz commented Dec 2, 2024

This issue is for discussing various considerations regarding cache use across
Flux in order to have coherency and consistency in the design. The following
sections will provide some background and introduction to various topics with
open ended questions. These will be updated as we discuss, find conclusions and
come to agreements on the design. The goal is to have a well defined
specification at the end.

Background

At present, the source-controller contains a Helm index cache for storing the
parsed index of a Helm repository once downloaded from remote. This cached index
is loaded from the cache and used in further reconciliations by the HelmChart
reconciler for building a chart. This helps keep the memory usage of
source-controller low. Similarly, there's a need to have cache in various Flux
components to improve the performance, resource usage and interaction with
external systems.

The cache in source-controller at present is based on
https://github.com/patrickmn/go-cache. There are new generic cache
implementations available at https://github.com/fluxcd/pkg/tree/main/cache for
expiring and least recently used cache types.

Cache use cases

Authentication token caching

Flux integrates with various cloud providers and platforms that provided
short-lived authentication tokens. At present, every time a Flux component needs
to authenticate with a service, it obtains a new token, uses it and it gets
deleted. These tokens are short-lived but can be used multiple times within
their short expiry period. Requesting a new token at every reconciliation
interval could result in too many requests to the token provider service and
potentially getting rate limited. Flux components should be able to obtain
tokens, cache them for the time they are valid and reuse them if possible.
GitRepository, OCIRepository, HelmRepository(OCI) and Kustomization(for
obtaining secrets from Key Management Service(KMS)) would benefit from
this.

Questions:

  • Is there a difference in the short-lived tokens and the secrets from KMS?
    If the KMS secrets are considerably long-lived, should we intentionally set
    short expiry time for them to avoid storing sensitive data for long?

Image tag caching

image-reflector-controller downloads image tags and stores them in a badger
database. The image tags in the database remain in the database as long as the
controller runs. There is no provision to clear the unused image tags that are
present in the database. There is also an overhead in running the badger
database that results in relatively high memory usage by the controller. The
database can be replaced by a Least Recently Used(LRU) cache which can store
image tags, similar to the database, and also clear the least recently used
image tags from the cache when a certain cache size limit is reached.

Saving and reloading the cache data

In large scale deployments, being able to save the cache on disk in case of a
restart and reload the cache to resume the operations would be ideal.

In case of authentication tokens, if a Flux component restarts, it'll reconcile
every single object, irrespective of their reconciliation interval. This would
result in a surge in requests for new authentication tokens, which can overwhelm
the token provider service. Being able to persist the cached tokens and reuse
them would help avoid the surge in requests to token provider service.

In case of image tags, it wouldn't be much different from using a persistent
storage for badger database. Being able to save and reload the cache would make
sure that the data persistence feature continues to work like before using a
cache.

NOTE: This section exists just for clearing the intent. The details of the
implementation will be discussed separately in the future. This feature is
intended to be added later.

Cache key

Since Flux supports multi-tenancy, the cache usage needs to respect the
multi-tenancy boundaries. Tenants should not be able to access other tenant's
cached data. The cache data is accessed through the cache keys. A tenant should
not be able to configure Flux resource such that their resource uses the cached
data of another resource that belongs to another tenant.

In case of the existing Helm index cache, the cache key used is the
HelmRepository artifact path, which is
<kind>/<namespace>/<name>/<artifact-digest>. For example
helmrepository/default/podinfo/index-83a3c595163a6ff0333e0154c790383b5be441b9db632cb36da11db1c4ece111.yaml.
This makes sure that resources from another namespace can't use the key of
a resource in another namespace. Within the same namespace, two artifact with
same index but different resource name will not be able to share the same cached
item.

Questions:

  • Depending on the implementation, constructing the keys by appending different
    input strings leaves a possibility of the key to be trimmed at the key data
    type capacity. Creating a possibility of conflicting keys for different
    resources. Is this a good reason to use resource's unique identifier(UID)
    instead? The UIDs of the resources are generated by kube-apiserver and can't
    be manipulated by user input. It may not be critical for helm index cache to
    change the key now, but auth token cache is more sensitive.

Cloud Provider Workload identity

Flux components support workload identity for obtaining short-lived
authentication tokens using the ServiceAccount of the pod. A pod can have only
one identity that's shared by all the Flux resources managed by the pod. Using a
single common cache key, like the cloud provider name, would work.

There's a need to allow multi-tenant OIDC authentication by specifying the
ServiceAccount to use by the resource, refer #5022. This would allow every resource to
have a unique identity. Considering this, it would be better to have unique
cache key for these multi-tenant authentication tokens or resources.

Questions:

  • The unique cache key for multi-tenant auth tokens could be based on the
    ServiceAccount, as that contains the identity of the resource, or it
    could also be unique for every resource. If it's based on the ServiceAccount,
    two resources referring the same ServiceAccount identity can share the cached
    token. Otherwise, in case of unique key for every resource, the cache is not
    shared with any other resource, but it is still reused by the same resource.
    Which strategy would be better? Any downsides of one or the other?
  • Regardless of whether the key is based on ServiceAccount or the resource,
    using the UID of these objects would provide a unique key value which is a
    fixed reference to the object. There's a caution expressed in
    Implement new packages auth, azure and git for passwordless authentication scenarios pkg#789 (comment) regarding this
    for cache invalidation when the object identity is removed or changed. The
    first time there's a change, the old cached data will be used. This was
    previously addressed in
    Implement new packages auth, azure and git for passwordless authentication scenarios pkg#789 (comment) by considering
    the cache as a best-effort solution. When the old cached data usage results in
    a failure, a new token using the changed identity is obtained and cached for
    the same resource. Is this sufficient? Or do we need to add a mechanism to
    determine if the identity has changed. It may be difficult to know without
    reading the content of the identity as the identity can change for the same
    object name and metadata.

The above questions would help decide the designs for both OCI and Git client's
cache use with workload identity.

OAuth identity

For Git authentication on platforms like GitHub and GitLab specifically, OAuth
is used to obtain short-lived tokens. Unlike in the case of workload identity,
in this case, every resource refers to a Secret containing authentication
materials to perform a token exchange. The Secret reference can be comparable to
the ServiceAccount identity in case of workload identity mentioned above.
Similar set of questions and arguments apply to this too regarding the cache key
value to use.

NOTE: Previously, there had been discussions about using the resource URL or
the account ID as the cache key. These are not multi-tenant friendly. The above
considerations of using the resource or the ServiceAccount/Secret specific
unique key helps maintain multi-tenancy by not allowing resources to access
other's cached values by changing the account or repository name in the input.

Cache capacity full behavior

In case of an LRU cache, as new items get added, the old ones are moved out of
the cache. The user of the cache can continue write to the cache. But in case of
expiring cache, the caller needs to handle the scenario where the cache is full. In
this scenario, in most of the cases, the caller should ignore the failure and continue
functioning. The item that is usually stored in the cache won't be in cache, until
existing items expire and cache has capacity to have new items. If needed, the
cluster admins can increase the cache size through a flag.

Questions:

  • How should a cache full event be made visible to the users and admins?
  • Should there be a metric that shows how much of the cache is full? At present,
    there is cached_items metric which shows the total items in the cache. The total
    size can also be exported, which can be used to determine the available capacity.
  • In case of multiple tenants, should there be a consideration for the prevent the
    cache size from interfering with other tenant's usage?

Cache organization

The caches will be used for storing values of different types:

  • Helm repository index cache value will be of IndexFile type
  • Authentication token cache value will be of string type
  • Image tags cache value will be a string slice

Due to the cache being generic typed cache, they have to be instantiated
separately for their use case. The Helm repository index, authentication token
and image tags caches have to be independent caches due to the data type of the
value they store.

In case of authentication token cache, they store string values. This cache use
is common amongst some resources, specifically in the case of source-controller,
the GitRepository, HelmRepository(OCI) and OCIRepository auth tokens. They can
potentially use a single instance of cache. But for separation of concerns and
independence, they will have separate cache instances. This will also prevent
any potential of interfering resources of other types through the shared cache.

Cache metrics

The new generic cache is self-instrumenting and automatically exports metrics if
a metrics registerer is configured. In order to have unique metrics for each of
the caches, unique prefixes are used. For example, for Git auth token cache, the
prefix gotk_git_auth_ will be used, resulting in metrics like
gotk_git_auth_cache_events_total, gotk_git_auth_cached_items, etc. See
fluxcd/source-controller#1644 for further details.

The following cache metrics prefixes will be used:

  • Helm repository index cache gotk_helm_index_
  • Git authentication token cache gotk_git_auth_
  • OCI authentication token cache gotk_oci_auth_
  • Helm OCI authentication token cache gotk_helm_oci_auth_
  • Kustomize authentication token cache gotk_kms_auth_
  • Image tags cache gotk_image_tags_

Questions:

  • Is gotk_kms_auth_ accurate for how it'll be used?

Cache default configurations

The existing Helm index cache is an expiring cache with the following default
configurations:

  • max size - 0 (disabled by default)
  • item TTL - 15 minutes
  • purge interval - 1 minute

For authentication token cache, the item TTL can be set automatically based on
the response from the token provider service.

The following default configurations will be used for auth token cache for
GitRepository, HelmRepository(OCI) and OCIRepository:

  • max size - ?
  • purge interval - ?

The following default configurations will be used for auth token cache for
Kustomization:

  • max size - ?
  • purge interval - ?

The image tags cache will use an LRU cache which only supports setting a max
size. The default size will be _ ?

Questions:

  • Does the KSM secret return any expiration time?
  • Should the auth token cache be allowed to have a user defined override for
    TTL?

Exposing cache configuration controls

For the existing Helm index cache, the following command-line flags are provided
for configuring it:

  • --helm-cache-max-size - for setting the cache max size
  • --helm-cache-ttl - for setting the time to live of cached items
  • --helm-cache-purge-interval - for setting the cache purge/cleanup interval

Based on the previous section, only the max size and purge interval
configuration flags will be needed for auth token caches.

The following flags will be provided for auth token caches:

  • ?

The following flags will be provided for image tags cache:

  • --image-tags-cache-max-size

Questions:

  • In case of source-controller, should similar set of flags for GitRepository,
    HelmRepository(OCI) and OCIRepository be provided independent of one
    another? Or should there be a common flag for all the auth cache
    configuration? Even through internally there are separate caches, they can be
    presented as a single configuration to the users for simplicity.
@makkes
Copy link
Member

makkes commented Dec 3, 2024

This is a great initiative for streamlining our caching strategy and also our understanding/documentation of it!

The image tags in the database remain in the database as long as the controller runs.

To be precise, the database is persisted on disk and users have the ability to use a PV for durability between pod restarts.

The database can be replaced by a Least Recently Used(LRU) cache which can store image tags, similar to the database, and also clear the least recently used image tags from the cache when a certain cache size limit is reached.

Just to get clarity here: Are you suggesting to use a pure in-memory cache? I don't think this is what we want for IRC in order to prevent the controller from hammering all registries upon restart.

@stefanprodan
Copy link
Member

stefanprodan commented Dec 3, 2024

we want for IRC in order to prevent the controller from hammering all registries upon restart

I think right now, upon restart all repos are scanned as the reconciler will fill the queue with all objects.

@makkes
Copy link
Member

makkes commented Dec 3, 2024

we want for IRC in order to prevent the controller from hammering all registries upon restart

I think right now, upon restart all repos are scanned as the reconciler will fill the queue with all objects.

I don't think that's true as the shouldScan method would determine if a repository should be scanned and that depends on the last scan time. So if the controller comes up for the first time (e.g. after a Pod restart), there are tags for the given repository in the database and the last scan time is not too far back, it will not be scanned.

@darkowlzz
Copy link
Contributor Author

darkowlzz commented Dec 3, 2024

Just to get clarity here: Are you suggesting to use a pure in-memory cache? I don't think this is what we want for IRC in order to prevent the controller from hammering all registries upon restart.

@makkes I'm just putting all the partial ideas that I've heard of together here as I too wasn't sure about the plan for IRC. I came across this in fluxcd/pkg#778 which states that tags will be stored in the cache and the cache data will be persisted on disk and reloaded on restart. As mentioned above, I don't see any change in the behavior from this compared to what we have today with badger database if a persistent volume is used. Just that the overhead of running badger is replaced by the cache which stores everything in memory.
Previously, I remember there was a discussion about using just files for storing the tags, similar to SC artifacts. I think we need file based storage as well, as the memory usage would increase with more image and their tags are added, if cache replaces the database. Cache can be used just as a cache to keep only the recently used image tags in memory and the rest in a file. But I wonder if this is an actual problem we have today. I have not heard of any reports of tag retrieval from the database being slow. I don't think or know if reading the tags from file would be so slow or resource intensive that we need to add a cache.
I may have missed if there was any discussion about the need for a cache in IRC previously.

@darkowlzz
Copy link
Contributor Author

We discussed a little about this in the dev meeting today. We will discuss more in detail next week.

We discussed about the image-reflector-controller use case. The cache idea for image tags to replace badger came about as a way to address the memory issue seen in low-memory environment, refer https://fluxcd.io/flux/cheatsheets/troubleshooting/#how-do-i-resolve-a-unable-to-open-the-badger-database-that-puts-image-reflector-controller-in-crashloopbackoff. But as mentioned in the last comment, there are trade-offs in moving everything to an in-memory cache. We don't know if the latest versions of badger have made some adjustments and we don't really have this issue anymore. This needs to be verified. Considering the potential issues with moving everything in memory, especially for large deployments with lot of images and tags, we'll avoid the idea of adding a cache to replace badger for now.
There is a potential of moving the image tags to source-controller in the future, maybe as v2 of the ImageRepository API, where the tags can be stored in files and distributed like artifacts. Users have expressed the need to be able to query all the tags collected by image-reflector-controller before, to be able to build their own custom automation on top of it. This can be discussed separately in the future.

I'll remove references of image-reflector-controller and image tags from the main cache use discussion comment above.

@darkowlzz
Copy link
Contributor Author

I just added a section Cache capacity full behavior with some questions around it.

@darkowlzz
Copy link
Contributor Author

In this week's dev meeting, we didn't get to go through all the questions but we discussed if there is still a need for the LRU cache code, now that we don't have a plan for cache in image-reflector-controller.
Stefan proposed that maybe the Helm index cache, which is an expiring cache at present, can use LRU cache, keeping all recently used indices in the cache, until they get evicted due to other items that were used more frequently. This is better for scenarios where the cache TTL is, say 15 minutes, and the reconciliation interval is, say 30 minutes. The cached items expire at a fixed interval, regardless of the resources that use it. An LRU cache would make sure that cache is taken advantage of properly. But the downside of this will be to store all the items in memory all the time, which may be okay, considering that the cache is there to help reduce the resources needed to parse the whole index every time. An expiring cache would require more fine tuning with the cache TTL to make its use efficient. An LRU cache will make sure that once parsed, it'll be reused most likely.

I'd like to wait for any other opinion or concerns about this idea before adding it in the main comment above. So far, it sounds like a good idea to me, personally.

@matheuscscp
Copy link
Member

In this week's dev meeting, we didn't get to go through all the questions but we discussed if there is still a need for the LRU cache code, now that we don't have a plan for cache in image-reflector-controller. Stefan proposed that maybe the Helm index cache, which is an expiring cache at present, can use LRU cache, keeping all recently used indices in the cache, until they get evicted due to other items that were used more frequently. This is better for scenarios where the cache TTL is, say 15 minutes, and the reconciliation interval is, say 30 minutes. The cached items expire at a fixed interval, regardless of the resources that use it. An LRU cache would make sure that cache is taken advantage of properly. But the downside of this will be to store all the items in memory all the time, which may be okay, considering that the cache is there to help reduce the resources needed to parse the whole index every time. An expiring cache would require more fine tuning with the cache TTL to make its use efficient. An LRU cache will make sure that once parsed, it'll be reused most likely.

This reasoning also sounds good to me 👍

@stefanprodan
Copy link
Member

stefanprodan commented Dec 13, 2024

But the downside of this will be to store all the items in memory all the time

Just to clary this, we're not going to store all the indexes in memory all the time, but only the number of items that the cluster admin sets in the --helm-cache-max-size. The LRU cache will ensure that the indexes which are most used are prioritised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants