Skip to content

0.1.25 scylla-bench image pull error while executing some longevity tests for Scylla Operator (disrupt_truncate_large_partition nemesis) #184

@grzywin

Description

@grzywin

During SCT longevity tests execution for Scylla Operator one of the nemesis (disrupt_truncate_large_partition) is failing with error:
https://argus.scylladb.com/tests/scylla-cluster-tests/cc29b0a8-f215-4d69-abb8-fc6698b8849d

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5720, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2189, in disrupt_truncate_large_partition
    self.tester.verify_stress_thread(bench_thread, error_handler=self._nemesis_stress_failure_handler)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 2354, in verify_stress_thread
    error_handler(thread_pool, errors)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2141, in _nemesis_stress_failure_handler
    raise NemesisStressFailure(
sdcm.exceptions.NemesisStressFailure: Aborting 'SisyphusMonkey' nemesis as stress command failed with the following errors:
 on node 'sct-loaders-eu-west-2-0': ['Stress command execution failed with: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({\'Audit-Id\': \'e37a585a-860f-42d6-983b-6dc445870981\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'Date\': \'Fri, 02 May 2025 08:03:47 GMT\', \'Content-Length\': \'230\'})\nHTTP response body: b\'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\\\"loader\\\\" in pod \\\\"sct-loaders-eu-west-2-0-pod-7\\\\" is waiting to start: trying and failing to pull image","reason":"BadRequest","code":400}\\n\'\n']
 on node 'sct-loaders-eu-west-2-1': ['Stress command execution failed with: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({\'Audit-Id\': \'21b319ae-a5a8-45d3-a134-a90848787594\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'Date\': \'Fri, 02 May 2025 08:03:51 GMT\', \'Content-Length\': \'230\'})\nHTTP response body: b\'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\\\"loader\\\\" in pod \\\\"sct-loaders-eu-west-2-1-pod-4\\\\" is waiting to start: trying and failing to pull image","reason":"BadRequest","code":400}\\n\'\n']

After some futher investigation I found out in logs that there is some problem with pulling scylla-bench 0.1.25 image:

< t:2025-04-22 10:22:33,892 f:base.py         l:225  c:LocalCmdRunner       p:DEBUG > <ip-35-177-15-185>:     state:
< t:2025-04-22 10:22:33,892 f:base.py         l:225  c:LocalCmdRunner       p:DEBUG > <ip-35-177-15-185>:       waiting:
< t:2025-04-22 10:22:33,892 f:base.py         l:225  c:LocalCmdRunner       p:DEBUG > <ip-35-177-15-185>:         message: 'Back-off pulling image "scylladb/scylla-bench:0.1.25": ErrImagePull:
< t:2025-04-22 10:22:33,892 f:base.py         l:225  c:LocalCmdRunner       p:DEBUG > <ip-35-177-15-185>:           rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/scylladb/scylla-bench:0.1.25":
< t:2025-04-22 10:22:33,892 f:base.py         l:225  c:LocalCmdRunner       p:DEBUG > <ip-35-177-15-185>:           no match for platform in manifest: not found'
< t:2025-04-22 10:22:33,892 f:base.py         l:225  c:LocalCmdRunner       p:DEBUG > <ip-35-177-15-185>:         reason: ImagePullBackOff

As adviced by @dimakr I added extra param to the test run to use scylla-bench 0.1.24 instead of 0.1.25 and after this change the nemesis passed without error, so there is probably something wrong with 0.1.25 image.
SCT_STRESS_IMAGE={"scylla-bench":"scylladb/hydra-loaders:scylla-bench-v0.1.24"}
Passing run:
https://argus.scylladb.com/tests/scylla-cluster-tests/6550b2c7-cda1-4db6-9058-ca9f3984a1a6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions