-
Notifications
You must be signed in to change notification settings - Fork 38
Closed
Description
During SCT longevity tests execution for Scylla Operator one of the nemesis (disrupt_truncate_large_partition) is failing with error:
https://argus.scylladb.com/tests/scylla-cluster-tests/cc29b0a8-f215-4d69-abb8-fc6698b8849d
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5720, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2189, in disrupt_truncate_large_partition
self.tester.verify_stress_thread(bench_thread, error_handler=self._nemesis_stress_failure_handler)
File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 2354, in verify_stress_thread
error_handler(thread_pool, errors)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2141, in _nemesis_stress_failure_handler
raise NemesisStressFailure(
sdcm.exceptions.NemesisStressFailure: Aborting 'SisyphusMonkey' nemesis as stress command failed with the following errors:
on node 'sct-loaders-eu-west-2-0': ['Stress command execution failed with: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({\'Audit-Id\': \'e37a585a-860f-42d6-983b-6dc445870981\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'Date\': \'Fri, 02 May 2025 08:03:47 GMT\', \'Content-Length\': \'230\'})\nHTTP response body: b\'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\\\"loader\\\\" in pod \\\\"sct-loaders-eu-west-2-0-pod-7\\\\" is waiting to start: trying and failing to pull image","reason":"BadRequest","code":400}\\n\'\n']
on node 'sct-loaders-eu-west-2-1': ['Stress command execution failed with: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({\'Audit-Id\': \'21b319ae-a5a8-45d3-a134-a90848787594\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'Date\': \'Fri, 02 May 2025 08:03:51 GMT\', \'Content-Length\': \'230\'})\nHTTP response body: b\'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\\\"loader\\\\" in pod \\\\"sct-loaders-eu-west-2-1-pod-4\\\\" is waiting to start: trying and failing to pull image","reason":"BadRequest","code":400}\\n\'\n']
After some futher investigation I found out in logs that there is some problem with pulling scylla-bench 0.1.25 image:
< t:2025-04-22 10:22:33,892 f:base.py l:225 c:LocalCmdRunner p:DEBUG > <ip-35-177-15-185>: state:
< t:2025-04-22 10:22:33,892 f:base.py l:225 c:LocalCmdRunner p:DEBUG > <ip-35-177-15-185>: waiting:
< t:2025-04-22 10:22:33,892 f:base.py l:225 c:LocalCmdRunner p:DEBUG > <ip-35-177-15-185>: message: 'Back-off pulling image "scylladb/scylla-bench:0.1.25": ErrImagePull:
< t:2025-04-22 10:22:33,892 f:base.py l:225 c:LocalCmdRunner p:DEBUG > <ip-35-177-15-185>: rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/scylladb/scylla-bench:0.1.25":
< t:2025-04-22 10:22:33,892 f:base.py l:225 c:LocalCmdRunner p:DEBUG > <ip-35-177-15-185>: no match for platform in manifest: not found'
< t:2025-04-22 10:22:33,892 f:base.py l:225 c:LocalCmdRunner p:DEBUG > <ip-35-177-15-185>: reason: ImagePullBackOff
As adviced by @dimakr I added extra param to the test run to use scylla-bench 0.1.24 instead of 0.1.25 and after this change the nemesis passed without error, so there is probably something wrong with 0.1.25 image.
SCT_STRESS_IMAGE={"scylla-bench":"scylladb/hydra-loaders:scylla-bench-v0.1.24"}
Passing run:
https://argus.scylladb.com/tests/scylla-cluster-tests/6550b2c7-cda1-4db6-9058-ca9f3984a1a6
Metadata
Metadata
Assignees
Labels
No labels