Ensure that cluster upgrade in HA mode is not disruptive and document the shortcomings #1213
Description
Currently, we do not test availability impact of our upgrade process on the cluster. This means that ideally, when user is doing the upgrade process, all production traffic should be migrated to some other cluster, to make sure there is no disturbance to the provided services.
As we support in-place upgrades, we should ensure (test and document), that when cluster is running in HA setup (3 or 5 controller nodes), production traffic is not affected. This should include testing things like:
- Kubernetes API reading and writing
- End components availability (Dex, Gangway, etc.)
- End application availability (httpbin can be used for testing)
- MetalLB
- Contour
We should make sure, that all components are configured in HA mode, so when nodes gets drained etc, services remains operational at all times. If this is not possible for some reason (e.g. because of single read-write storage without application-level replication) like Prometheus, we should document that.
See also #485