* refactor(sdk): added option for custom metric collector for tune in… #2406

prakhar479 · 2024-08-09T23:50:25Z

… katlib_client.py

Signed-off-by: prakhar479 [email protected]

added custom_collector field to metrics_collector_config in tune to allow for users to specify custom metrics collector for example prometheus

fixes #2402

google-cla · 2024-08-09T23:50:29Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-oss-prow · 2024-08-09T23:50:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

… katlib_client.py Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 · 2024-08-10T00:05:07Z

@Electronic-Waste can you review and let me know if any changes are required in this. Thanks a lot!

Electronic-Waste · 2024-08-12T14:03:57Z

@prakhar479 Yes, of course. Thanks for your great effort to Katib!

I'll look into this PR in the next few days.

andreyvelich

/ok-to-test

andreyvelich · 2024-08-12T14:20:02Z

/rerun-all

Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 · 2024-08-14T14:30:24Z

I have corrected some oversights from my side and need approval for testing. Thanks!

Electronic-Waste · 2024-08-15T03:12:59Z

/rerun-all

Electronic-Waste

Nice job @prakhar479 , I left some comments for you!

Electronic-Waste · 2024-08-15T03:27:02Z

sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

+                for using custom metric collectors use "custom_collector" key,
+                for example, `metrics_collector_config = {"custom_collector": "prometheus"}`.


I guess prometheus is not a custom collector, since we define it here:

katib/pkg/apis/controller/common/v1beta1/common_types.go

Lines 216 to 220 in 8eb0e86

PrometheusMetricCollector CollectorKind = "PrometheusMetric"

DefaultPrometheusPath string = "/metrics"

DefaultPrometheusPort int = 8080

CustomCollector CollectorKind = "Custom"

Here are some relevant resources which may be helpful for you:
https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml

Btw, custom_collector accepts V1Container as its input, not a str:

katib/sdk/python/v1beta1/kubeflow/katib/models/v1beta1_collector_spec.py

Lines 21 to 43 in 8eb0e86

class V1beta1CollectorSpec(object):

"""NOTE: This class is auto generated by OpenAPI Generator.

Ref: https://openapi-generator.tech

Do not edit the class manually.

"""

"""

Attributes:

openapi_types (dict): The key is attribute name

and the value is attribute type.

attribute_map (dict): The key is attribute name

and the value is json key in definition.

"""

openapi_types = {

'custom_collector': 'V1Container',

'kind': 'str'

}

attribute_map = {

'custom_collector': 'customCollector',

'kind': 'kind'

}

Can you add some comments to remind users of this usage?

Also, can you add some e2e tests like this (an example for StdOut Collector)?

katib/test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

Line 16 in 8eb0e86

def run_e2e_experiment_create_by_tune(

Thanks @Electronic-Waste, I have made the appropriate changes in the function comments to reflect the correct usage

for e2e tests, should I modify the existing run-e2e-tune-api.py test to use metrics_collector_config or should I create a separate test script for it altogether

@prakhar479 I think modifying the existing run-e2e-tune-api.py test is better since it's hard to pass the collector config to the test script: https://github.com/kubeflow/katib/blob/master/test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.sh. WDYT👀 @andreyvelich @tenzen-y @johnugeorge

Also a small tip for reference: you can build images you need here

katib/test/e2e/v1beta1/scripts/gh-actions/build-load.sh

Lines 166 to 170 in 8eb0e86

# Testing image for tune function

if "$TUNE_API"; then

echo -e "\nPulling and building testing image for tune function..."

_build_containers "suggestion-hyperopt" "$CMD_PREFIX/suggestion/hyperopt/$VERSION/Dockerfile"

fi

Thanks @Electronic-Waste for the helpful insights 😄

Electronic-Waste · 2024-08-15T10:10:15Z

PTAL👀 @andreyvelich @tenzen-y @johnugeorge when you have time.

Signed-off-by: Prakhar Singhal <[email protected]>

…ontainer Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 · 2024-08-17T21:33:50Z

I have modified the comment on usage of the custom metric param as well e2e test for tune Api.
For e2e test I have currently modified build-load.sh as suggested by @Electronic-Waste to build an image of custom metric using dummy-collector.py script and Dockerfile.dummy-collector file for building image for dummy collector container. Finally, I have modified run-e2e-tune-api.py adding the custom collector image as a V1 Container passed as a param to tune Api.

I was a bit confused with placement for these new files and have placed all of them in gh-action directory. Let me know for any modifications, changes and fixes I need to make further.

… dummy-collector image Signed-off-by: Prakhar Singhal <[email protected]>

Electronic-Waste

@prakhar479 Sorry for my late response. I'm busy with other affairs these two weeks.

I left a few comments for you. Thanks for your greate contribution!

Electronic-Waste · 2024-08-25T15:09:54Z

sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

@@ -251,8 +254,8 @@ def tune(
            pip_index_url: The PyPI url from which to install Python packages.
            metrics_collector_config: Specify the config of metrics collector, 
                for example, `metrics_collector_config = {"kind": "Push"}`.
-                Currently, we only support `StdOut` and `Push` metrics collector.


I think we may need to tell users about the supported types of MC. So can you re-add this line?

Electronic-Waste · 2024-08-25T15:13:01Z

sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

+                for using custom metric collectors use "custom_collector" key and provide instance of custom V1Container as value,
+                for example, `metrics_collector_config = {"kind" : "Custom", "custom_collector": <Instance of V1Container>}`.


Maybe we can reorganize these comments and explain each field in metrics_collector_config? Like

kind: specify the kind of Metrics Collector (currently we support...) custom_collector: ...

Electronic-Waste · 2024-08-25T15:20:05Z

test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

-    # [3] Create Katib Experiment with 4 Trials and 2 CPUs per Trial.
+    # [3] Create a dummy metric collector (DOES NOT HAVE A IMAGE)
+    metric_collector = V1Container(
+        name="dummy-collector",
+        image="dummy-collector:latest",
+        command=["python", "/app/dummy-collector.py"],
+        args=["--metric-name=result"],
+        env=[
+            client.V1EnvVar(name="EXPERIMENT_NAME", value=exp_name),
+            client.V1EnvVar(name="EXPERIMENT_NAMESPACE", value=exp_namespace)
+        ]
+    )


I guess we can create another function run_e2e_experiment_create_by_tune_custom to run e2e tests for custom collector, rather than delete original e2e test for StdOut collector. Then we can run these e2e tests together in this file:)

WDYT👀 @prakhar479 @andreyvelich @tenzen-y @johnugeorge

Sure I think its a good idea .Will make this change in a few days

I have made corresponding changes to run both e2e tests (one with custom metrics collector and another with default metric collector [StdOut])

Electronic-Waste · 2024-08-25T15:21:33Z

/rerun-all

Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 · 2024-08-29T23:39:51Z

I have made the neccesary changes that should also solve failing tests. Let me know about any further changes/suggestions @Electronic-Waste @andreyvelich @tenzen-y @johnugeorge.

Electronic-Waste

Thanks for your contribution @prakhar479 . I left some comments for you.

Electronic-Waste · 2024-08-30T13:08:38Z

test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

@@ -35,7 +96,7 @@ def objective(parameters):
        "b": search.double(min=0.1, max=0.2)
    }

-    # [3] Create Katib Experiment with 4 Trials and 2 CPUs per Trial.
+    # [4] Create Katib Experiment with 4 Trials and 2 CPUs per Trial.


Why its numer is 4? I guess 3 is more suitable.

Electronic-Waste · 2024-08-30T13:10:16Z

test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

    try:
-        run_e2e_experiment_create_by_tune(katib_client, exp_name, exp_namespace)
+        exp_name = "tune-example-default-metrics-collector"
+        logging.info(f"Runnning E2E for Experiment created by tune: {exp_namespace}/{exp_name}")
+        run_e2e_experiment_create_by_tune_default_metrics_collector(katib_client, exp_name, exp_namespace)
+        logging.info("---------------------------------------------------------------")
+        logging.info(f"E2E is succeeded for Experiment created by tune: {exp_namespace}/{exp_name}")
+
+        exp_name = "tune-example-custom-metrics-collector"
+        logging.info(f"Runnning E2E for Experiment created by tune: {exp_namespace}/{exp_name}")
+        run_e2e_experiment_create_by_tune_custom_metrics_collector(katib_client, exp_name, exp_namespace)


Maybe splitting these two e2e tests into separate try-catch clauses is better to identify error :)

Electronic-Waste · 2024-08-30T13:12:53Z

test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

@@ -93,4 +162,4 @@ def objective(parameters):
        # Delete the Experiment.
        logging.info("---------------------------------------------------------------")
        logging.info("---------------------------------------------------------------")
-        katib_client.delete_experiment(exp_name, exp_namespace)
+        katib_client.delete_experiment(exp_name, exp_namespace)


I think we need to delete the former experiment before we run another experiment. Otherwise, we may run into xxx experiment alreay exists error.

…g and fixed some bugs Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 · 2024-09-01T20:02:38Z

@Electronic-Waste I have fixed these issues lmk if anything else is needed. thanks!

Electronic-Waste · 2024-09-02T03:12:51Z

/rerun-all

Electronic-Waste · 2024-09-02T06:34:25Z

@prakhar479 Can you please fix the lint error and the error in tune API?

Signed-off-by: Prakhar Singhal <[email protected]>

Electronic-Waste · 2024-09-02T10:08:32Z

/rerun-all

andreyvelich

Thank you for doing this @prakhar479!
I left a few comments.
cc @kubeflow/wg-automl-leads

andreyvelich · 2024-09-02T13:13:04Z

sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

+            `metrics_collector_config`: Specify the configuration for the metrics collector with following keys:
+            - **kind**: Specify the kind of Metrics Collector. Currently supported values are:
+                - `StdOut`: Collects metrics from standard output.
+                - `None`: No metrics collection.


This is not supported.

Thanks for pointing this out. Can you lead me to where I can find all supported metric collector. For the current comment I had referenced https://github.com/kubeflow/katib/blob/master/pkg/ui/v1beta1/frontend/src/app/models/experiment.k8s.model.ts

Yeah, UI also needs to be updated with the latest changes cc @Electronic-Waste
Please ref the official CRDs APIs for Metrics Collector spec: https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L207-L227

Thanks Andrey, I'll update UI with the latest changes.

andreyvelich · 2024-09-02T13:14:50Z

sdk/python/v1beta1/kubeflow/katib/api/katib_client.py

+            - **custom_collector**: If the `kind` is set to `Custom`, you must provide an instance of a custom `V1Container` as the value. For example:
+                `metrics_collector_config = {"kind" : "Custom", "custom_collector": <Instance of V1Container>}`.


Can we add support for custom_collectors in the followup PRs when we get user requests ?
I feel that Data Scientists who will use tune API doesn't need such functionality.

Sorry but can you elaborate on this a bit I am not quite sure what this intends since custom metric is already supported by underlying framework for tune function.

I meant that until we find use-cases when tune function needs to be used with Custom metrics collector, we can introduce it.
For example, I can see the value for File metrics collector, since users can write metrics into specific File during their training script, similar as for TensorFlow events.
However, for custom metrics collector, I can't see a use-cases when it might be useful with tune function.

Any thoughts @shannonbradshaw @prakhar479 @johnugeorge @tenzen-y ?

andreyvelich · 2024-09-02T13:15:58Z

test/e2e/v1beta1/scripts/gh-actions/Dockerfile.dummy-collector

+
+RUN pip install kubernetes
+
+CMD ["python", "dummy-collector.py"]


Why do we need this container ?

This container is meant for serving as a dummy metric collector container intended for e2e test.

andreyvelich · 2024-09-02T13:17:58Z

test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.py

@@ -11,8 +12,68 @@
 # The default logging config.
 logging.basicConfig(level=logging.INFO)

+def run_e2e_experiment_create_by_tune_custom_metrics_collector(


I don't think we need another E2E just to test this functionality.
We already run E2Es using these YAML files for various metrics collectors: https://github.com/kubeflow/katib/tree/master/examples/v1beta1/metrics-collector
I think for your feature, you just need to add unit tests for Katib Client to verify that Experiment has correct specification: https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client_test.py

Sure! should I then remove the e2e test with custom metric collector from run-e2e-tune-api.py?

Yes, please can you add the unit tests for your functionality: https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client_test.py ?

Signed-off-by: Prakhar Singhal <[email protected]>

google-oss-prow bot requested review from andreyvelich, anencore94 and johnugeorge August 9, 2024 23:50

google-oss-prow bot added the size/XS label Aug 9, 2024

prakhar479 force-pushed the master branch from 1489e53 to 4e8c6fa Compare August 10, 2024 00:00

* refactor(sdk): added option for custom metric collector for tune in…

6c85385

… katlib_client.py Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 force-pushed the master branch from 4e8c6fa to 6c85385 Compare August 10, 2024 00:03

andreyvelich reviewed Aug 12, 2024

View reviewed changes

google-oss-prow bot added the ok-to-test label Aug 12, 2024

google-oss-prow bot added size/S and removed size/XS labels Aug 13, 2024

added default value to custom_collector field

b746da0

Signed-off-by: Prakhar Singhal <[email protected]>

prakhar479 force-pushed the master branch from ca616ca to b746da0 Compare August 13, 2024 20:23

Electronic-Waste reviewed Aug 15, 2024

View reviewed changes

prakhar479 added 2 commits August 15, 2024 17:04

Modified usage comment for custom_metric_collector

516cabe

Signed-off-by: Prakhar Singhal <[email protected]>

Modified e2e test for tune api to include a custom metric collector c…

67c9b78

…ontainer Signed-off-by: Prakhar Singhal <[email protected]>

google-oss-prow bot added size/M and removed size/S labels Aug 17, 2024

Modified build-load.sh to use _build_container functionality to build…

a0a7e53

… dummy-collector image Signed-off-by: Prakhar Singhal <[email protected]>

Electronic-Waste reviewed Aug 25, 2024

View reviewed changes

prakhar479 and others added 4 commits August 26, 2024 10:25

fixed metric_collector_config param in run-e2e-tune-api

80a718c

Signed-off-by: Prakhar Singhal <[email protected]>

Updated comments for metrics collector config param

9245bcd

Signed-off-by: Prakhar Singhal <[email protected]>

Merge branch 'master' into master

83ecb32

added e2e test for custom and default metric collectors

2d8ecd0

Signed-off-by: Prakhar Singhal <[email protected]>

google-oss-prow bot added size/L and removed size/M labels Aug 27, 2024

Electronic-Waste reviewed Aug 30, 2024

View reviewed changes

seperated e2e tests into seperate try blocks for better error handlin…

2222b99

…g and fixed some bugs Signed-off-by: Prakhar Singhal <[email protected]>

Fixed precommit lint check and directory bug in tune e2e tests

3f4ea55

Signed-off-by: Prakhar Singhal <[email protected]>

andreyvelich reviewed Sep 2, 2024

View reviewed changes

prakhar479 added 2 commits September 2, 2024 14:24

lint fix

a51b863

Signed-off-by: Prakhar Singhal <[email protected]>

Added executable permissions

1aa9c48

Signed-off-by: Prakhar Singhal <[email protected]>

		for using custom metric collectors use "custom_collector" key,
		for example, `metrics_collector_config = {"custom_collector": "prometheus"}`.

	PrometheusMetricCollector CollectorKind = "PrometheusMetric"
	DefaultPrometheusPath string = "/metrics"
	DefaultPrometheusPort int = 8080

	CustomCollector CollectorKind = "Custom"

	class V1beta1CollectorSpec(object):
	"""NOTE: This class is auto generated by OpenAPI Generator.
	Ref: https://openapi-generator.tech

	Do not edit the class manually.
	"""

	"""
	Attributes:
	openapi_types (dict): The key is attribute name
	and the value is attribute type.
	attribute_map (dict): The key is attribute name
	and the value is json key in definition.
	"""
	openapi_types = {
	'custom_collector': 'V1Container',
	'kind': 'str'
	}

	attribute_map = {
	'custom_collector': 'customCollector',
	'kind': 'kind'
	}

	# Testing image for tune function
	if "$TUNE_API"; then
	echo -e "\nPulling and building testing image for tune function..."
	_build_containers "suggestion-hyperopt" "$CMD_PREFIX/suggestion/hyperopt/$VERSION/Dockerfile"
	fi

		- custom_collector: If the `kind` is set to `Custom`, you must provide an instance of a custom `V1Container` as the value. For example:
		`metrics_collector_config = {"kind" : "Custom", "custom_collector": <Instance of V1Container>}`.


		RUN pip install kubernetes

		CMD ["python", "dummy-collector.py"]

* refactor(sdk): added option for custom metric collector for tune in… #2406

Are you sure you want to change the base?

* refactor(sdk): added option for custom metric collector for tune in… #2406

Conversation

prakhar479 commented Aug 9, 2024 • edited Loading

google-cla bot commented Aug 9, 2024

google-oss-prow bot commented Aug 9, 2024

prakhar479 commented Aug 10, 2024

Electronic-Waste commented Aug 12, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

andreyvelich commented Aug 12, 2024

prakhar479 commented Aug 14, 2024

Electronic-Waste commented Aug 15, 2024

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prakhar479 Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Electronic-Waste commented Aug 15, 2024

prakhar479 commented Aug 17, 2024 • edited Loading

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Electronic-Waste commented Aug 25, 2024

prakhar479 commented Aug 29, 2024

Electronic-Waste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prakhar479 commented Sep 1, 2024

Electronic-Waste commented Sep 2, 2024

Electronic-Waste commented Sep 2, 2024

Electronic-Waste commented Sep 2, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prakhar479 Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prakhar479 commented Aug 9, 2024 •

edited

Loading

prakhar479 Aug 15, 2024 •

edited

Loading

prakhar479 commented Aug 17, 2024 •

edited

Loading

prakhar479 Sep 2, 2024 •

edited

Loading