Skip to content

Missing BottlerocketShadow CRD causes excessive logging #478

Closed
@maxres-ch

Description

@maxres-ch

https://github.com/bottlerocket-os/bottlerocket-update-operator/issues?q=is%3Aissue+backoff

Image I'm using:
v1.0.0

Issue or Feature Request:

We've started using the helm chart on the develop branch. We missed that the shadow chart had to be installed. When the operator started it rightly wrote error messages because the API didn't exist. We didn't see the error because the controller went healthy.

We saw around 53.6 million error messages in a 7 minute window and a peak of 130K messages / sec when the bottlerocket shadow CRD was not installed.

I would have expected the controller to not be healthy --or-- there's exponential backoffs used when these kind of errors are hit (could potentially DoS the kube-api server too). Controllers I've written in the past used backoff to prevent crazy logs and hammering the API. I saw that at least one of the errors was had a static backoff of 5 secs, but there's some others that seem to have no backoff.

 Message
[[wildcard]*[/wildcard]:[wildcard][00-59][/wildcard]:[wildcard][00.018549-59.992987][/wildcard]Z�[0m �[33m WARN�[0m �[1;33mkube_client::client�[0m�[33m: �[33mUnsuccessful data error parse: 404 page not found
[0m
[2;3mat�[0m /src/.cargo/registry/src/github.com-1ecc6299db9ec823/kube-client-0.76.0/src/client/mod.rs:434
[2;3min�[0m [wildcard]*[/wildcard]::[wildcard]*[/wildcard]::�[[wildcard]*[/wildcard]�[0m
[2;3mat�[0m [wildcard]*[/wildcard][wildcard]/src/*[/wildcard]
"[[wildcard]*[/wildcard]:[wildcard][02-59][/wildcard]:[wildcard][00.593393-58.285542][/wildcard]Z�[0m �[32m INFO�[0m �[1;32magent::apiclient�[0m�[32m: �[32mAPI server busy, retrying later ...�[0m"
"[2;3min�[0m agent::agentclient::�[1mshadow_status_with_refreshed_system_matadata�[0m �[2;3mwith�[0m �[1mshadow_error_info�[0m: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }"
"[2;3min�[0m agent::agentclient::�[1mupdate_status_in_shadow�[0m �[2;3mwith�[0m �[1mbottlerocket_shadow�[0m: BottlerocketShadow { metadata: ObjectMeta { annotations: None, cluster_name: None, creation_timestamp: Some(Time([wildcard]yyyy-MM-ddTHH:mm:ssXXX[/wildcard])), deletion_grace_period_seconds: None, deletion_timestamp: None, finalizers: None, generate_name: None, generation: Some(1), labels: None, managed_fields: Some([ManagedFieldsEntry { api_version: Some(""brupop.bottlerocket.aws/v2""), fields_type: Some(""FieldsV1""), fields_v1: Some(FieldsV1(Object {""f:metadata"": Object {""f:ownerReferences"": Object {""."": Object {}, ""k:{\""uid\"":\""[wildcard]*[/wildcard]\""}"": Object {}}}, ""f:spec"": Object {""."": Object {}, ""f:state"": Object {}, ""f:state_transition_timestamp"": Object {}, ""f:version"": Object {}}})), manager: Some(""unknown""), operation: Some(""Update""), time: Some(Time([wildcard]yyyy-MM-ddTHH:mm:ssXXX[/wildcard])) }, ManagedFieldsEntry { api_version: Some(""brupop.bottlerocket.aws/v2""), fields_type: Some(""FieldsV1""), fields_v1: Some(FieldsV1(Object {""f:status"": Object {""."": Object {}, ""f:crash_count"": Object {}, ""f:current_state"": Object {}, ""f:current_version"": Object {}, ""f:target_version"": Object {}}})), manager: Some(""unknown""), operation: Some(""Update""), time: Some(Time([wildcard]yyyy-MM-ddTHH:mm:ssXXX[/wildcard])) }]), name: Some(""[wildcard]*[/wildcard].redacted.compute.internal""), namespace: Some(""brupop-bottlerocket-aws""), owner_references: Some([OwnerReference { api_version: ""v1"", block_owner_deletion: None, controller: None, kind: ""Node"", name: ""[wildcard]*[/wildcard].us-west-2.compute.internal"", uid: ""[wildcard]*[/wildcard]"" }]), resource_version: Some(""[wildcard][54829321-278377934][/wildcard]""), self_link: None, uid: Some(""[wildcard]*[/wildcard]"") }, spec: BottlerocketShadowSpec { state: Idle, state_transition_timestamp: None, version: None }, status: Some(BottlerocketShadowStatus { current_version: ""1.14.1"", target_version: ""1.14.1"", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }, �[1mstate�[0m: Idle, �[1mshadow_error_info�[0m: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }"
"[[wildcard]*[/wildcard]:[wildcard][03-59][/wildcard]:[wildcard][04.738003-58.311911][/wildcard]Z�[0m �[32m INFO�[0m �[1;32mcontroller::controller�[0m�[32m: �[32mFound associated bottlerocketshadow name., �[1;32massociated_bottlerocketshadow_name�[0m�[32m: ""[wildcard]*[/wildcard].redacted.compute.internal""�[0m"
[[wildcard]*[/wildcard]:[wildcard][00-59][/wildcard]:[wildcard][01.302051-52.595513][/wildcard]Z�[0m �[32m INFO�[0m �[1;32mcontroller::controller�[0m�[32m: �[32mCalculating if current time is within update time window.�[0m
"[2;3min�[0m apiserver::telemetry::�[1mHTTP request�[0m �[2;3mwith�[0m �[1mhttp.method�[0m: POST, �[1mhttp.route�[0m: /bottlerocket-node-resource, �[1mhttp.flavor�[0m: 1.1, �[1mhttp.scheme�[0m: https, �[1mhttp.host�[0m: brupop-apiserver.brupop-bottlerocket-aws.svc.cluster.local, �[1mhttp.client_ip�[0m: [wildcard]10.52.XXX.XXX[/wildcard]:[wildcard]XXXX[/wildcard], �[1mhttp.user_agent�[0m: , �[1mhttp.target�[0m: /bottlerocket-node-resource, �[1motel.kind�[0m: ""server"", �[1mrequest_id�[0m: [wildcard]*[/wildcard], �[1mnode_name�[0m: ""[wildcard]*[/wildcard].redacted.compute.internal"""
"[2;3min�[0m apiserver::telemetry::�[1mHTTP request�[0m �[2;3mwith�[0m �[1mhttp.method�[0m: POST, �[1mhttp.route�[0m: /bottlerocket-node-resource, �[1mhttp.flavor�[0m: 1.1, �[1mhttp.scheme�[0m: https, �[1mhttp.host�[0m: brupop-apiserver.brupop-bottlerocket-aws.svc.cluster.local, �[1mhttp.client_ip�[0m: [wildcard]10.52.XXX.XXX[/wildcard]:[wildcard]XXXX[/wildcard], �[1mhttp.user_agent�[0m: , �[1mhttp.target�[0m: /bottlerocket-node-resource, �[1motel.kind�[0m: ""server"", �[1mrequest_id�[0m: [wildcard]*[/wildcard], �[1mnode_name�[0m: ""[wildcard]*[/wildcard].redacted.compute.internal"", �[1mexception.message�[0m: Error creating BottlerocketShadow: 'Unable to create BottlerocketShadow ([wildcard]*[/wildcard].us-west-2.compute.internal, [wildcard]*[/wildcard]): 'ApiError: ""404 page not found "": Failed to parse error data (ErrorResponse { status: ""404 Not Found"", message: ""\""404 page not found\\n\"""", reason: ""Failed to parse error data"", code: 404 })'', �[1mexception.details�[0m: BottlerocketShadowCreate { source: CreateBottlerocketShadow { source: Api(ErrorResponse { status: ""404 Not Found"", message: ""\""404 page not found\\n\"""", reason: ""Failed to parse error data"", code: 404 }), selector: BottlerocketShadowSelector { node_name: ""[wildcard]*[/wildcard].us-west-2.compute.internal"", node_uid: ""[wildcard]*[/wildcard]"" } } }, �[1mhttp.status_code�[0m: 500, �[1motel.status_code�[0m: ""ERROR"""
[2m2023-07-05T21:38:39.981472Z�[0m �[33m WARN�[0m �[1;33magent::agentclient�[0m�[33m: �[33mAn error occurred when try to create BottlerocketShadow. Restarting event loop.�[0m
"[2;3min�[0m models::node::client::�[1mcreate_node�[0m �[2;3mwith�[0m �[1mselector�[0m: BottlerocketShadowSelector { node_name: ""ip-10-52-96-90.redacted.compute.internal"", node_uid: ""340b1e0b-a34a-4557-99f9-4d2e55dfec7a"" }"

Slightly better view of logged message patterns:

Screen Shot 2023-07-06 at 1 12 26 PM

Screen Shot 2023-07-06 at 1 37 11 PM

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions