-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Litmus attacks fail to work on OpenShift Cluster v4.3 #1538
Comments
Yes, You are right the Pumba is not supporting the CRIO runtime. We are using the Pumba(docker) as the default one. You could use a different LIB to run this container-kill exp on containerd/CRIO run times internally it uses crictl. Please verify the socket path on your cluster nodes and modify accordingly We can override the value of
|
Thanks a lot Shubham. For verifying the socket path, do I issue something like the below please? thanks again.
|
Hi Shubham, Below is the chaosengine.yaml but no luck with the attack. Anything you can suggest please? unsure if I need to change anything on the below please? many thanks.
|
cc: @gprasath @ispeakc0de |
Hi @Vijay5775 Can you please share the following information to understand the problem?
|
As requested, #1 containerd-chaos-sdjnfb Observation: The log window just shows a 'Hello!' and the pod just continues to show a status of 'running'. It doesn't quit even after completion of the test unless you manually delete. I've attached the log file pulled from the OpenShift console (again will just have a 'Hello!' text) Pending: Will update the sock file /var/run/crio/crio.sock and /etc/crictl.yaml details possibly by noon today #2 container-kill-0brnoh-6fp2b Observation: Could see this message in Line 133 and in subsequent lines as:
Have attached the log for you to review and advice further, if the issue is due to the above? (have masked only the IP as 10.xx.xx.xx, rest is just 'as-is' from OpenShift Console) #3 articles-chaos-runner-chaos-runner |
The pending details as below, thanks.
Appreciate if you can take a look and advice of a fix please? many thanks. |
Hi @Vijay5775 Thanks for providing all the information. I have made a few modifications to the container-kill experiment. It would be great if you will try the following experiment CR(modified according to your use case). Please let me know if you will still face any problems in running this experiment.
|
Hi @ispeakc0de Thanks for your prompt response and support. After applying the new 'container-kill' experiment supplied above, did ran into errors initially but was able to spot that the previous 'chaosengine.yaml' had a reference to LIB_IMAGE as " Results as below, Few things below I need your help with please:
And thanks again for the excellent support provided. Much appreciated, cheers. |
Hi @ispeakc0de The other thing I noticed is, the execution (container-kill experiment) just runs successfully for the first time. Once I change the target container details within the same yaml files (chaosengine.yaml & container_kill_experiment.yaml) and then attempt to run the test again to kill a different container, then the test doesn't appear to run nor schedule any new container-kill-xxxx pods. (also, it doesn't work even if I try to target the first container again that was previously successful) Wasn't able to debug this much further but could see the below, Please can you help advice? many thanks. |
Yes, The experiment docs have the |
There is one to one mapping between the chaos experiment and chaos engine. If you modified the chaos experiment (made some changes in its spec & recreate it). The corresponding chaos engine is unable to consume the new changes as it already created the resources with the older one. To reflect the new changes we have to re-create the corresponding chaos engine as well so that it will point to the newer version of chaos experiment. from the provided screenshot I am able to see a |
Hi @ispeakc0de , Did try recreating fresh yaml's for Chaos Experiment and Chaos Engine. #1 It doesn't appear to initiate or trigger any attacks, nor schedule any container-kill-xxxx-xxxx pods. Appreciate if you can help advice where the problem is please? thanks. I've attached the yaml's below for reference: chaoseng1.txt 2nd Iteration: chaoseng2.txt Regards, |
Hi @Vijay5775 , I apologize for the late response, Is your application annotated? if not please annotate the application first because In your chaosengine manifest, the
If you don't want to annotate the application, alternative is change the value of |
The support for containerd & CRI-O runtime has been enhanced in the latest 1.8.0 release (ref: release notes). Also supported is the ability to inject chaos w/o defining chaos annotations on the target deployment (via |
I'm trying to run Chaos tests using Litmus on OCP v4.3. I've followed all steps outlined here -> https://docs.litmuschaos.io/docs/openshift-litmus/
The output of each commands outlined in the above URL did appear as expected. But, when trying to run the experiment, it fails. I've just tried the first experiment 'container-kill' but unfortunately, can't get the container killed.
Additionally, have elevated privileges to the Service Account that I created 'container-kill-sa' and added it to 'anyuid' and 'privileged' scc's but still can't seem to able to crack.
One thing I did notice is, its triggering a 'pumba-sig-kill' container to initiate the attack (as attached and the container I'm targeting is highlighted as well). If Litmus is based on Pumba, then I doubt it will work on OCP v4.3, as Pumba developer has already confirmed that the code doesn't support CRI-O based runtime environments (which OCP v4.3 runs on and this is the case from OCP version 3.7. i.e. starting from OCP v3.7, the runtime environment is CRI-O).
Please can someone help confirm if above is the reason for attacks getting failed? If not, is there a way to get this resolved please? many thanks.
p.s. also can't seem to figure out anything from the log (as attached)
The text was updated successfully, but these errors were encountered: