Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

HannesWell · 2024-12-06T18:54:02Z

At the moment the smoke tests running on CentOS-9 and Open SUSE leap are unstable and often fail or time out:
https://ci.eclipse.org/releng/job/SmokeTests/job/Start-smoke-tests/

This issue is an umbrella to investigate and fix this.
The first thing to notice is that when running for example the smaller teamcore test-suite all smoke tests on all platforms succeeded.

Therefore I assume the failure is related to the UI, since by default smoke tests run the ui suite.

Community

I understand reporting an issue to this OSS project does not mandate anyone to fix it. Other contributors may consider the issue, or not, at their own convenience. The most efficient way to get it fixed is that I fix it myself and contribute it back as a good quality patch to the project.

The text was updated successfully, but these errors were encountered:

HannesWell · 2024-12-06T18:58:05Z

A pattern that I see regularly on CentOS-9 and OpenSUSE machines is:

Executing command: "pkill" "Xvnc" 
exit
sh: line 31: pkill: command not found

Therefore I wonder if installing procps could solve the failures/timeouts? As suggested in:

https://www.thegeekdiary.com/pkill-command-not-found/

Can anyone with more Linux experience assess this?

akurtakov · 2024-12-09T19:42:27Z

It will probably remove the warning/error from the log but I doubt it will actually improve the test results (it's still good to see less false positives so please do it) as I don't see pkill used in our scripts at all.

Part of eclipse-platform#2642

The 'procps'-package contains the pkill command which is currently missing according to the smoke-tests logs. Part of eclipse-platform#2642

The 'procps'-package contains the 'pkill' command which is currently missing according to the smoke-tests logs. Part of eclipse-platform#2642

The 'procps'-package contains the 'pkill' command which is currently missing according to the smoke-tests logs. Part of #2642

HannesWell · 2024-12-18T23:53:55Z

It will probably remove the warning/error from the log but I doubt it will actually improve the test results (it's still good to see less false positives so please do it) as I don't see pkill used in our scripts at all.

In the latest smoke-test executions, i.e. since #2678 is available the CentOS and OpenSuse tests seem to be much more stable. The OpenSuse tests seem to have stabilized before a bit, but CentOS seem to succeeded only since that change.

Initially my theory is that without pkill the xvnc session somehow survived between the multiple builds on the same platform with different Java-versions. But since each build runs in a docker-container each run should actually be fully independent.
Or can xvnc somehow leak through the host machine?

Besides the good news about OpenSuse and CentOS we have also bad news that the linux.riscv tests seem to be unstable too. maybe something is missing on that machine too.
Will look into them later.

Always archiving artifacts help to debug failures. Avoiding to use a 'main'-agent to just start all test-configurations running on other machines saves precious resources in the Jenkins instance while waiting for the tests to complete. Additionally simplify post build result notifications. Part of eclipse-platform#2642

Always archiving artifacts help to debug failures. Avoiding to use a 'main'-agent to just start all test-configurations running on other machines saves precious resources in the Jenkins instance while waiting for the tests to complete. Additionally simplify post build result notifications. Part of #2642

akurtakov · 2025-02-21T06:39:05Z

Is there still smth to be done or it can be closed?

HannesWell · 2025-02-22T10:12:29Z

Is there still smth to be done or it can be closed?

The Centos-9 and OpenSUSE tests seem to be rock-stable now, but as mentioned the Linux-RISV tests have two failures (for both Java versions: 21 and 23).

org.eclipse.ui.tests.api.IEditorRegistryTest.testFindExternalEditor

java.lang.AssertionError: The OS should have at least one external editor
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.eclipse.ui.tests.api.IEditorRegistryTest.testFindExternalEditor(IEditorRegistryTest.java:154)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)

org.eclipse.ui.tests.progress.ProgressViewTests.testItemOrder:

Wrong job order: arrays first differed at element [0]; expected:<1. User Job(3832)> but was:<2. High Priority Job(3833)>
	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
	at org.junit.Assert.internalArrayEquals(Assert.java:534)
	at org.junit.Assert.assertArrayEquals(Assert.java:285)
	at org.eclipse.ui.tests.progress.ProgressViewTests.testItemOrder(ProgressViewTests.java:189)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
Caused by: java.lang.AssertionError: expected:<1. User Job(3832)> but was:<2. High Priority Job(3833)>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:120)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
	... 5 more

The first failure looks like something is missing at the test-machine. @akurtakov can you give an advice how to solve that?
And the second failure doesn't seem to happen always. Maybe it's related to the first or the test should be a bit more flexible or it's something else.

HannesWell added bug Something isn't working help wanted Extra attention is needed labels Dec 6, 2024

HannesWell added a commit to HannesWell/eclipse.platform.releng.aggregator that referenced this issue Dec 17, 2024

[Build] Install 'procps' package in all docker-images for smoke-tests

7b263b8

Part of eclipse-platform#2642

HannesWell mentioned this issue Dec 17, 2024

Install git and procps package in all docker-images #2678

Merged

HannesWell added a commit that referenced this issue Dec 18, 2024

[Build] Install 'procps' package in all docker-images for smoke-tests

4a0e862

The 'procps'-package contains the 'pkill' command which is currently missing according to the smoke-tests logs. Part of #2642

HannesWell mentioned this issue Dec 19, 2024

[Build] Always archive artifacts in smoke-tests and use no main agent #2687

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

HannesWell commented Dec 6, 2024

HannesWell commented Dec 6, 2024

akurtakov commented Dec 9, 2024

HannesWell commented Dec 18, 2024

akurtakov commented Feb 21, 2025

HannesWell commented Feb 22, 2025

Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

Comments

HannesWell commented Dec 6, 2024

Community

HannesWell commented Dec 6, 2024

akurtakov commented Dec 9, 2024

HannesWell commented Dec 18, 2024

akurtakov commented Feb 21, 2025

HannesWell commented Feb 22, 2025