-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare kustomization files for new operator #1322
Comments
@Jeffwan: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently, we use |
/good-first-issue |
@Jeffwan: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I will be posting a fix by end of the day or tomorrow. |
Actions taken: - replaced tf-job-operator => training-operator - replaced kubeflow-tfjobs- => kubeflow-training- - moved crds for mxjobs, tgjobs, pytorchjobs and xgboostjobs from config/crd/bases to manifests/base/ and prefixed them with crd_ Ref: kubeflow#1322 Testing steps: To be added Work in Progress
* 1322: Modified manifests to use all-in-one training-operator WIP Actions taken: - replaced tf-job-operator => training-operator - replaced kubeflow-tfjobs- => kubeflow-training- - moved crds for mxjobs, tgjobs, pytorchjobs and xgboostjobs from config/crd/bases to manifests/base/ and prefixed them with crd_ Ref: #1322 Testing steps: To be added Work in Progress * 1322: synced up config/manager with manifests Training operator was found to be working <pre> k -n kubeflow logs -f training-operator-694766989-pp2j4 I0812 21:43:24.739862 1 request.go:645] Throttling request took 1.048945631s, request: GET:https://172.19.0.1:443/apis/networking.k8s.io/v1?timeout=32s 2021-08-12T21:43:25.694Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"} 2021-08-12T21:43:25.790Z INFO setup starting manager 2021-08-12T21:43:25.790Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"} 2021-08-12T21:43:25.790Z INFO controller-runtime.manager.controller.tf-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:25.790Z INFO controller-runtime.manager.controller.mxnet-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:25.791Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:25.791Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.289Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.294Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.589Z INFO controller-runtime.manager.controller.mxnet-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.688Z INFO controller-runtime.manager.controller.tf-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.889Z INFO controller-runtime.manager.controller.tf-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.889Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.890Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.890Z INFO controller-runtime.manager.controller.mxnet-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting Controller 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.tf-operator Starting Controller 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.tf-operator Starting workers {"worker count": 1} 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting Controller 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting workers {"worker count": 1} 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting workers {"worker count": 1} 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.mxnet-operator Starting Controller 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.mxnet-operator Starting workers {"worker count": 1} </pre> * 1322: incorporated review comments - added all resources in ClusterRole * 1322: incorporated review comments - now controller-gen generates the crds directly in manifests/base instead of config/crd/bases - updated setup-training-operator.sh to use manifests/overlays/standalone * 1322: removed config/crd/bases as its now getting generated in manifests * 1322: incorporated review comments related to using separate role files * 1322: removed image name replacement
/priority p0 |
This can be closed. leader election can be separate story it's not a blocking issue |
Umbrella issue: #1318
We will need a new folder to host manifests for new operators. https://github.com/kubeflow/tf-operator/tree/master/manifests
This will also be used for integration tests.
/help
The text was updated successfully, but these errors were encountered: