-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to use etcd managed by cilium-etcd-operator as kvstore #8629
Add option to use etcd managed by cilium-etcd-operator as kvstore #8629
Conversation
Hi @olemarkus. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
88ab2b1
to
641ab56
Compare
hostNetwork: true | ||
{{- if .EtcdManaged }} | ||
# In managed etcd mode, Cilium must be able to resolve the DNS name of | ||
# the etcd service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to set the dnsPolicy
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I removed that part on purpose. Changing the DNS policy breaks cilium because it can no longer look up the internal cluster DNS (since we use the public DNS entry for that). The change on line 14 ( "etcd.operator": "true"
) removes the need to change the DNS policy.
That particular change hasn't made it into the cilium helm templates yet, if you are comparing to what that one outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a conditional wrapped around nothing but comments here. Perhaps remove the conditional and comments as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thanks. I was certain I removed that one. 🤦♂️
image: "cilium/cilium-etcd-operator:v2.0.7" | ||
name: cilium-etcd-operator | ||
dnsPolicy: ClusterFirst | ||
hostNetwork: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
priorityClassName: system-cluster-critical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure this should qualify for system-cluster-critical. It technically isn't very critical even though the pods it spawns is critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it gets preempted then it can't repair any problems with the Cilium etcds, such as their losing quorum.
Does cilium-etcd-operator run the etcd pods with host network? We forked etcd-operator to include coreos/etcd-operator#2094 and suspect we'll have to fork cilium-etcd-operator too. Also, does cilium-etcd-operator run the etcd pods with the system-cluster-critical priority class? Since etcd-operator appears to be unmaintained, I'm wondering if it would be practical to add a Cilium mode to etcd-manager. |
No, it doens't. It is unclear to me why that would be necessary.
No, I and I am not sure it is correct to do so either. The consequence of losing etcd is that state propagation will be slower, but it won't lead to immediate critical failure
That could be interesting. Everything that talks to etcd uses hostNetworking, so etcd-manager's etcd should be reachable too. It would certainly the concerns above. Would be a pretty big change though. |
With hundreds of nodes, "slower" may well be "completely unstable". If someone doesn't need etcd for state propagation it's likely they wouldn't enable it. I suppose in the crd identity allocation mode an agent could bootstrap off of the apiserver, so running etcd on host network wouldn't be as important as for kvstore mode. It might even be able to do so without killing the apiserver when in a large cluster. |
Shouldn't we bump the version in bootstrapchannelbuilder.go? |
Cilium's stance on this is still that larger environments should use external etcd (https://docs.cilium.io/en/v1.7/gettingstarted/k8s-install-external-etcd/) Forking cilium-etcd-operator/etcd-operator etc isn't an option for me at least. |
Yep. bumped. |
External etcd would be expensive to manage. We run in-cluster etcd on host network with forked etcd-operator. When we are able to upgrade to 1.6 we'll look at crd identity allocation mode with etcd for propagation, assuming we're willing to stay under 250 nodes. Elsewhere they've mentioned that pure crd is only good for up to 50 nodes. Host network is not a blocker for this PR. I'm trying to decide whether I want to hold out for priorityClassName on cilium-etcd-operator. If cilium-etcd-operator happens to not put them on the etcd pods then it'd be a moot point unless/until that could be fixed upstream. |
I think we should push this in without and then I can see if I can push cilium to propagate the priority class. I agree that external etcd is too expensive, but I want to see how hard it is to add another etcd-manager cluster. That will have to come later though |
/lgtm |
/retest |
1 similar comment
/retest |
It looks like cilium/cilium-etcd-operator#67 is going to be a significant defect now that kops rolling update can drain nodes in parallel. |
Yep. I think I am going to have a look at etcd-manager sooner than I thought. |
d827401
to
4c5bef8
Compare
path: etcd.config | ||
name: cilium-config | ||
name: etcd-config-path | ||
# To read the k8s etcd secrets in case the user might want to use TLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the Cilium etcd secrets, not the k8s ones, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. etcd-operator creates the secret cilium-etcd-secret
.
upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.12.yaml.template
Outdated
Show resolved
Hide resolved
…1.12.yaml.template Co-Authored-By: John Gardiner Myers <[email protected]>
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olemarkus, rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR adds the possibility to use etcd as kvstore for cilium agent state. See https://docs.cilium.io/en/v1.7/gettingstarted/k8s-install-etcd-operator/ for the details.
Fixes #8465
Note that coredns 1.6+ currently has a bug that prevents etcd-operator from creating clusters, so if one wants to test this, downgrade coredns first. See coredns/coredns#3686