Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

PR 135 makes replicaType updates hard #154

Open
Jeffwan opened this issue Aug 29, 2021 · 3 comments
Open

PR 135 makes replicaType updates hard #154

Jeffwan opened this issue Aug 29, 2021 · 3 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Aug 29, 2021

In this PR, https://github.com/kubeflow/common/pull/135/files, it changes rtype to ReplicaType.

However, it brings some challenges in operator upgrade.

-		expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, rtype)
+		expectationPodsKey := expectation.GenExpectationPodsKey(jobKey, apiv1.ReplicaType(rtype))

Using this line as an example, controller retrieve rtype from pod label. It's lower case at this moment.

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/apis/tensorflow/v1/types.go#L77

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/apis/pytorch/v1/pytorchjob_types.go#L62

apiv1.ReplicaType(rtype) is inaccurate and meaningless because we define Upper case role like Master in the past,
and this conversion give us a non-exist ReplicaType because it's lower case master.

we have some references like below.

https://github.com/kubeflow/tf-operator/blob/ff5aaf10bd8a665cbce72dd2cd1d2c0cd06f329e/pkg/controller.v1/pytorch/pytorch.go#L46

I feel like all the role should be lower case, we make sure we don't use ReplicaType for comparison. Instead, still use strings.toLower(string(rtype)) which means PR 135 still not helpful?

/cc @MartinForReal @gaocegege @kubeflow/wg-training-leads

@terrytangyuan
Copy link
Member

This indeed feel like unnecessary and complicate the implementations. Kubeflow/common should aim to be easy for downstream operators to implement.

@MartinForReal
Copy link
Member

Maybe more helper functions would resolve comparison issue? I think we can add some type name convention by defining ReplicaType type and add more util functions.

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 30, 2021

Maybe more helper functions would resolve comparison issue? I think we can add some type name convention by defining ReplicaType type and add more util functions.

@MartinForReal

Yeah, I add something like https://github.com/kubeflow/tf-operator/pull/1388/files#diff-7600abe3663b532d64fec9dae52fcb3b972fb034c53a45fe0030fc61776d1612R56

Did you use kubeflow/common in operator other than kubeflow training operators?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants