Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

✨ Adding CAPI Provisioner to vc-manager #136

Conversation

christopherhein
Copy link
Contributor

What this PR does / why we need it:
This adds integration points between VC and CAPN but more so CAPI as it doesn't use any CAPN components, solely reliant on CAPI's v1alpha4.Cluster{} resource. The templates/cluster-template-virtualcluster.yaml is a clusterctl --flavor for auto configuring the VirtualCluster CR with the cluster.

Testing:

  1. Clone CAPI master
  2. build master clusterctl
  3. Clone this branch
  4. Run local build https://github.com/kubernetes-sigs/cluster-api-provider-nested/tree/main/docs#create-docker-images-manifests-and-load-images
  5. cd virtualcluster/
  6. make build-images
  7. kind load docker-image virtualcluster/vn-agent-amd64 && kind load docker-image virtualcluster/syncer-amd64 && kind load docker-image virtualcluster/manager-amd64
  8. kubectl apply -f config/crd/
  9. kubectl apply -f config/setup/all_in_one_capi.yaml
  10. From this branch run: ../cluster-api/bin/clusterctl generate cluster ${CLUSTER_NAME} --from templates/cluster-template-virtualcluster.yaml | k apply -f -

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #81
Fixes #135

/milestone v0.1.x

@k8s-ci-robot k8s-ci-robot added this to the v0.1.x milestone Jun 16, 2021
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 16, 2021
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 16, 2021
}

return ctrl.Result{}, nil
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this will just constant poll for changes until status provisioned, this should in the long run be triggered by the NCP status updating instead.

@@ -0,0 +1,128 @@

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the role.yaml wasn't working cause the comments were nested under a function. Moving them allows controller-gen to regenerate these, I'm not sure if they are 100% but this work can be punted to #130

image: virtualcluster/manager-amd64
imagePullPolicy: Always
imagePullPolicy: Never
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be handled properly with #130 using kustomize to set them differently based on the env/stage, for now Never allows us to build images locally and load them into kind for testing.

github.com/moby/spdystream => github.com/moby/spdystream v0.2.0
k8s.io/api => k8s.io/api v0.21.1
k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.21.1
k8s.io/apimachinery => k8s.io/apimachinery v0.21.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll notice this also updates us to v0.21 for the lates Controller Runtime version that CAPI v1alpha4 shipped with.

@christopherhein
Copy link
Contributor Author

/assign @Fei-Guo @charleszheng44 @weiling61

@christopherhein christopherhein changed the title ✨ Adding CAPI Provisioner tro vc-manager ✨ Adding CAPI Provisioner to vc-manager Jun 16, 2021
Copy link

@Fei-Guo Fei-Guo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Nits.

@christopherhein
Copy link
Contributor Author

thanks @Fei-Guo updated based on your feedback.

return ctrl.NewControllerManagedBy(mgr).
WithOptions(opts).
For(&tenancyv1alpha1.VirtualCluster{}).
Owns(&clusterv1.Cluster{}).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting owner reference for the Cluster object is a must now, otherwise, the vcmanager will miss the event of Cluster state update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorta, right now if the cluster isn't up we blindly retry until successful, which accounts for this case, the code I removed wasn't being used at all since I didn't set and ownerref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is I'm not sure I fully agree that a VC should be the owner of the cluster, like there could be a case where you want to move this can not delete all the resources and having an ownerref would setup GC to delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens right - https://github.com/kubernetes-sigs/cluster-api-provider-nested/pull/136/files#diff-607056cc549543fc88ebc9695705629f15915ccaac3161e854845ca5a2f0b548R149-R159

We can definitely improve that too trying to figure out the right model for how this wires together.

Copy link

@Fei-Guo Fei-Guo Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that if vc is not the cluster owner, once cluster become ready, the event will not trigger vc reconciler because the cluster does not have owner. Then the vc status cannot be updated to ready unless we add periodic check. Do I miss anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, check the link in that last message, if the cluster is in Provisioning which is auto set if blank then I return ctrl.Result{RequeueAfter: 1*time.Second}, nil

Copy link

@Fei-Guo Fei-Guo Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Since vc lifecycle and cluster lifecycle are completely decoupled now, polling in VC controller may not be ideal since it may take longtime until the cluster is ready. It is ok for now, moving forward we can set up cluster Watch manually and implement a custom enqueue struct so that we can add the matching VC in the queue by checking the annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me file an issue to track on improving this, cause I 100% agree.

@Fei-Guo
Copy link

Fei-Guo commented Jun 16, 2021

LGTM. I will let Chao give a final approve.

@charleszheng44
Copy link
Contributor

@christopherhein Everything works as expected until I created a pod on the tenant cluster. The pod hangs in the pending state. Looks like the syncer can not connect to the tenant controlplane. I checked the syncer log and saw the following error message

I0617 16:08:03.453969       1 syncer.go:383] cluster default shutdown: Get "https://cluster-sample-apiserver:6443/api?timeout=30s": dial tcp 127.0.0.1:6443: connect: connection refused

That's wired. Because I was able to do port forwarding on svc/cluster-sample-apiserver 6443:6443

@christopherhein
Copy link
Contributor Author

@christopherhein Everything works as expected until I created a pod on the tenant cluster. The pod hangs in the pending state. Looks like the syncer can not connect to the tenant controlplane. I checked the syncer log and saw the following error message

I0617 16:08:03.453969       1 syncer.go:383] cluster default shutdown: Get "https://cluster-sample-apiserver:6443/api?timeout=30s": dial tcp 127.0.0.1:6443: connect: connection refused

That's wired. Because I was able to do port forwarding on svc/cluster-sample-apiserver 6443:6443

@charleszheng44 It looks like you are using the wrong specs, can you delete that cluster then redo it making sure that the Cluster object includes the namespace in the controlPlaneEndpoint.host you'll see that commented about in - templates/cluster-template-virtualcluster.yaml - https://github.com/kubernetes-sigs/cluster-api-provider-nested/pull/136/files#diff-ec9ceefbdef73c7fb4f5a8ef98f5342b423b770ce83d208b185267d54ed2a10eR7-R12

if c.ProvisionerName == "capi" {
if err := (&controllers.ReconcileCAPIVirtualCluster{
Client: mgr.GetClient(),
Log: c.Log.WithName("virtualcluster"),
Copy link
Contributor

@zhuangqh zhuangqh Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change the log name here to distinguish between different controller?

Copy link
Contributor Author

@christopherhein christopherhein Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still reconciling virtualcluster resources so I don't think so and they can't be turned on with other reconcilers since it's done via the --master-prov flag. Also we don't do this with native vs aliyun:

So I would say no, but I'm open to it.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: charleszheng44, christopherhein

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [charleszheng44,christopherhein]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@charleszheng44
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2021
@k8s-ci-robot k8s-ci-robot merged commit ce11f3e into kubernetes-retired:main Jun 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🌱 Add TargetNamespace to cluster-template.yaml ControlPlaneEndpoint 🌱 Integrate VC and CAPN
6 participants