Move overriding Scylla Operator's log level to CI scripts and make it configurable #2297

rzetelskik · 2025-01-03T12:27:51Z

Description of your changes: Currently, the Operator's and manager controller's loglevels are patched after they have already been deployed, which causes a rolling restart. It prevents us from giving operator guaranteed QoS in CI with resources >= half of what's available because PDB blocks the restart and everything gets stuck.

This PR moves the adjustment to CI scripts and makes the value configurable through e2e scripts.

Which issue is resolved by this Pull Request:
Resolves #2296

/cc

scylla-operator-bot · 2025-01-03T12:27:53Z

@rzetelskik: GitHub didn't allow me to request PR reviews from the following users: rzetelskik.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Description of your changes: wip

Which issue is resolved by this Pull Request:
Resolves #2296

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rzetelskik · 2025-01-08T15:11:24Z

Proof that loglevel propagated in ci-deploy.sh:

I added a transitional commit setting the env vars in hack/.ci/deploy-release-gke.sh to test it. The loglevel was propagated:

rzetelskik · 2025-01-09T09:52:17Z

/priority important-soon
/kind machinery

mflendrich

Assuming that the in place editing pattern for the operator & manager kustomizations is already present in these scripts (as opposed to creating a dependent kustomization), I agree that following the established way of doing things is the right decision (whilst a rewrite to a declarative approach is strongly preferable for maintainability reasons).

mflendrich · 2025-01-09T13:24:52Z

hack/.ci/run-e2e-gke-release.sh

+SO_SCYLLA_OPERATOR_LOGLEVEL="${SO_SCYLLA_OPERATOR_LOGLEVEL:-4}"
+export SO_SCYLLA_OPERATOR_LOGLEVEL
+


What do you think about creating a _run-e2e-common.inc.sh file with some of the bash boilerplate? (to get rid of the "repeated constants" code smell)

that would then be included using something like

source "$( dirname "${BASH_SOURCE[0]}" )/_run-e2e-common.inc.sh"

The idea behind separate scripts was to make the config platform-dependent and allow for configuring it in the CI jobs. The values here are meant as platform-dependent defaults, but as you can see, most of them repeat across different platforms.
This particular one could be embedded in ci-deploy.sh and ci-deploy-release.sh scripts, but it won't hold for some other examples (e.g. resources, as in #2276, because it will prevent us from using it locally and/or on less powerful setups).
I think having the shared sourced script could clean things up a bit, but I'm wondering if it's not going to obfuscate where the values are coming from in the end (given they could then come from the CI job, the shared script, or the e2e script)?
I'll make it your call. 😉

The option of embedding the default log level in ci-deploy.sh and ci-deploy-release.sh directly (and inlining the loglevel patch in the kustomization yaml) looks most promising to me (because this config is unlikely to be environment-dependent and adding excessive logic here doesn't add clear benefit)

Let's do it (that is: drop the conditional, embed the loglevel patch directly in the kustomization) if you think the same way. Structuring the config in a streamlined way would require a fundamental restructuring of the deploy machinery - something we won't do in this PR.

hack/ci-deploy.sh

rzetelskik · 2025-01-09T14:20:02Z

whilst a rewrite to a declarative approach is strongly preferable for maintainability reasons

I have a couple of similar PRs in queue, so maybe it's worth considering. I'm not sure how would you see this done though?

mflendrich · 2025-01-09T17:24:53Z

I'm not sure how would you see this done though?

I have started looking and it was more complex than I managed to produce a complete recipe for in a timeboxed effort. But basically the idea is that:

the kustomization file is either static or generated from a single "flat" template
the patches are static (and - where appropriate - replaced with kustomize transformers like images, replacements etc.
what varies between executions is: decisions whether to apply specific patches or not and what values to fill in - but keeping the structure of the kustomization fixed.

I am more than happy to try explore this tomorrow, or pair program. Kustomize has one important limitation that it doesn't like accepting external context that we'd need to work around somehow.

rzetelskik · 2025-01-10T10:37:54Z

I'm not sure how would you see this done though?

I have started looking and it was more complex than I managed to produce a complete recipe for in a timeboxed effort. But basically the idea is that:

the kustomization file is either static or generated from a single "flat" template

the patches are static (and - where appropriate - replaced with kustomize transformers like images, replacements etc.

what varies between executions is: decisions whether to apply specific patches or not and what values to fill in - but keeping the structure of the kustomization fixed.

I though about it before but I figured it wouldn't make that much of a difference since the patches would still have to be conditionally modified and concatenated to the kustomize file. It might make things more understandable, but I don't think it will change the flow in these scripts too much, so it seems mostly orthogonal to this PR.
How about we take this as a followup to these and other PRs in queue?

… configurable

rzetelskik · 2025-01-10T17:38:39Z

@mflendrich I believe I addressed all of your comments - let me know if we're on the same page now

mflendrich

looks good 👍

/lgtm
/approve

scylla-operator-bot · 2025-01-13T07:33:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mflendrich, rzetelskik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mflendrich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rzetelskik · 2025-01-13T08:48:06Z

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel-clusterip e981aa9 link true /test e2e-gke-parallel-clusterip
Full PR test history. Your PR dashboard.

#2304 (comment)
known manager flake
/test images
/retest

rzetelskik force-pushed the operator-loglevel-ci-script branch from 674ff94 to 4570486 Compare January 8, 2025 11:59

scylla-operator-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 8, 2025

rzetelskik force-pushed the operator-loglevel-ci-script branch 4 times, most recently from 6e104a8 to 7e42f88 Compare January 8, 2025 13:26

scylla-operator-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 8, 2025

rzetelskik changed the title ~~[WIP] Move overriding Operator's and Manager's log level to CI scripts and make it configurable~~ [WIP] Move overriding Scylla Operator's and Scylla Manager controller's log level to CI scripts and make it configurable Jan 8, 2025

rzetelskik changed the title ~~[WIP] Move overriding Scylla Operator's and Scylla Manager controller's log level to CI scripts and make it configurable~~ [WIP] Move overriding Scylla Operator's log level to CI scripts and make it configurable Jan 8, 2025

rzetelskik force-pushed the operator-loglevel-ci-script branch 2 times, most recently from 4f970b5 to 67740d8 Compare January 8, 2025 15:10

rzetelskik force-pushed the operator-loglevel-ci-script branch 6 times, most recently from 76ad542 to 235da58 Compare January 9, 2025 09:51

rzetelskik changed the title ~~[WIP] Move overriding Scylla Operator's log level to CI scripts and make it configurable~~ Move overriding Scylla Operator's log level to CI scripts and make it configurable Jan 9, 2025

scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 9, 2025

mflendrich reviewed Jan 9, 2025

View reviewed changes

rzetelskik force-pushed the operator-loglevel-ci-script branch from 235da58 to 721987b Compare January 10, 2025 10:58

rzetelskik changed the title ~~Move overriding Scylla Operator's log level to CI scripts and make it configurable~~ [WIP] Move overriding Scylla Operator's log level to CI scripts and make it configurable Jan 10, 2025

scylla-operator-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2025

rzetelskik force-pushed the operator-loglevel-ci-script branch 2 times, most recently from cc4e7d3 to ce2750c Compare January 10, 2025 14:30

scylla-operator-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 10, 2025

rzetelskik force-pushed the operator-loglevel-ci-script branch from ce2750c to ecb7300 Compare January 10, 2025 15:21

Move overriding Scylla Operator's log level to CI scripts and make it…

e981aa9

… configurable

rzetelskik force-pushed the operator-loglevel-ci-script branch 2 times, most recently from 0228d81 to e981aa9 Compare January 10, 2025 17:15

rzetelskik requested a review from mflendrich January 10, 2025 17:16

rzetelskik changed the title ~~[WIP] Move overriding Scylla Operator's log level to CI scripts and make it configurable~~ Move overriding Scylla Operator's log level to CI scripts and make it configurable Jan 10, 2025

scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 10, 2025

mflendrich approved these changes Jan 13, 2025

View reviewed changes

scylla-operator-bot bot assigned mflendrich Jan 13, 2025

scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2025

scylla-operator-bot bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2025

scylla-operator-bot bot merged commit 7ed4ab4 into scylladb:master Jan 13, 2025
14 checks passed

rzetelskik deleted the operator-loglevel-ci-script branch January 13, 2025 09:42

mflendrich mentioned this pull request Feb 12, 2025

Raise default ScyllaDB log level in CI for better debugging #2373

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move overriding Scylla Operator's log level to CI scripts and make it configurable #2297

Move overriding Scylla Operator's log level to CI scripts and make it configurable #2297

rzetelskik commented Jan 3, 2025 •

edited

Loading

scylla-operator-bot bot commented Jan 3, 2025

rzetelskik commented Jan 8, 2025 •

edited

Loading

rzetelskik commented Jan 9, 2025

mflendrich left a comment

mflendrich Jan 9, 2025

rzetelskik Jan 10, 2025

mflendrich Jan 10, 2025

rzetelskik Jan 10, 2025

rzetelskik commented Jan 9, 2025

mflendrich commented Jan 9, 2025

rzetelskik commented Jan 10, 2025

rzetelskik commented Jan 10, 2025

mflendrich left a comment

scylla-operator-bot bot commented Jan 13, 2025

rzetelskik commented Jan 13, 2025

		SO_SCYLLA_OPERATOR_LOGLEVEL="${SO_SCYLLA_OPERATOR_LOGLEVEL:-4}"
		export SO_SCYLLA_OPERATOR_LOGLEVEL

Move overriding Scylla Operator's log level to CI scripts and make it configurable #2297

Move overriding Scylla Operator's log level to CI scripts and make it configurable #2297

Conversation

rzetelskik commented Jan 3, 2025 • edited Loading

scylla-operator-bot bot commented Jan 3, 2025

rzetelskik commented Jan 8, 2025 • edited Loading

rzetelskik commented Jan 9, 2025

mflendrich left a comment

Choose a reason for hiding this comment

mflendrich Jan 9, 2025

Choose a reason for hiding this comment

rzetelskik Jan 10, 2025

Choose a reason for hiding this comment

mflendrich Jan 10, 2025

Choose a reason for hiding this comment

rzetelskik Jan 10, 2025

Choose a reason for hiding this comment

rzetelskik commented Jan 9, 2025

mflendrich commented Jan 9, 2025

rzetelskik commented Jan 10, 2025

rzetelskik commented Jan 10, 2025

mflendrich left a comment

Choose a reason for hiding this comment

scylla-operator-bot bot commented Jan 13, 2025

rzetelskik commented Jan 13, 2025

rzetelskik commented Jan 3, 2025 •

edited

Loading

rzetelskik commented Jan 8, 2025 •

edited

Loading