Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

Open
1 task done
HannesWell opened this issue Dec 6, 2024 · 5 comments
Open
1 task done

Fix unstable Centos-9 and OpenSUSE leap smoke tests #2642

HannesWell opened this issue Dec 6, 2024 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@HannesWell
Copy link
Member

At the moment the smoke tests running on CentOS-9 and Open SUSE leap are unstable and often fail or time out:
https://ci.eclipse.org/releng/job/SmokeTests/job/Start-smoke-tests/

This issue is an umbrella to investigate and fix this.
The first thing to notice is that when running for example the smaller teamcore test-suite all smoke tests on all platforms succeeded.

Therefore I assume the failure is related to the UI, since by default smoke tests run the ui suite.

Community

  • I understand reporting an issue to this OSS project does not mandate anyone to fix it. Other contributors may consider the issue, or not, at their own convenience. The most efficient way to get it fixed is that I fix it myself and contribute it back as a good quality patch to the project.
@HannesWell HannesWell added bug Something isn't working help wanted Extra attention is needed labels Dec 6, 2024
@HannesWell
Copy link
Member Author

A pattern that I see regularly on CentOS-9 and OpenSUSE machines is:

Executing command: "pkill" "Xvnc" 
exit
sh: line 31: pkill: command not found

Therefore I wonder if installing procps could solve the failures/timeouts? As suggested in:

Can anyone with more Linux experience assess this?

@akurtakov
Copy link
Member

It will probably remove the warning/error from the log but I doubt it will actually improve the test results (it's still good to see less false positives so please do it) as I don't see pkill used in our scripts at all.

HannesWell added a commit to HannesWell/eclipse.platform.releng.aggregator that referenced this issue Dec 17, 2024
HannesWell added a commit to HannesWell/eclipse.platform.releng.aggregator that referenced this issue Dec 17, 2024
The 'procps'-package contains the pkill command which is currently
missing according to the smoke-tests logs.

Part of eclipse-platform#2642
HannesWell added a commit to HannesWell/eclipse.platform.releng.aggregator that referenced this issue Dec 18, 2024
The 'procps'-package contains the pkill command which is currently
missing according to the smoke-tests logs.

Part of eclipse-platform#2642
HannesWell added a commit to HannesWell/eclipse.platform.releng.aggregator that referenced this issue Dec 18, 2024
The 'procps'-package contains the 'pkill' command which is currently
missing according to the smoke-tests logs.

Part of eclipse-platform#2642
HannesWell added a commit that referenced this issue Dec 18, 2024
The 'procps'-package contains the 'pkill' command which is currently
missing according to the smoke-tests logs.

Part of #2642
@HannesWell
Copy link
Member Author

It will probably remove the warning/error from the log but I doubt it will actually improve the test results (it's still good to see less false positives so please do it) as I don't see pkill used in our scripts at all.

In the latest smoke-test executions, i.e. since #2678 is available the CentOS and OpenSuse tests seem to be much more stable. The OpenSuse tests seem to have stabilized before a bit, but CentOS seem to succeeded only since that change.

Initially my theory is that without pkill the xvnc session somehow survived between the multiple builds on the same platform with different Java-versions. But since each build runs in a docker-container each run should actually be fully independent.
Or can xvnc somehow leak through the host machine?

Besides the good news about OpenSuse and CentOS we have also bad news that the linux.riscv tests seem to be unstable too. maybe something is missing on that machine too.
Will look into them later.

HannesWell added a commit to HannesWell/eclipse.platform.releng.aggregator that referenced this issue Dec 19, 2024
Always archiving artifacts help to debug failures.
Avoiding to use a 'main'-agent to just start all test-configurations
running on other machines saves precious resources in the Jenkins
instance while waiting for the tests to complete.

Additionally simplify post build result notifications.

Part of
eclipse-platform#2642
HannesWell added a commit that referenced this issue Dec 19, 2024
Always archiving artifacts help to debug failures.
Avoiding to use a 'main'-agent to just start all test-configurations
running on other machines saves precious resources in the Jenkins
instance while waiting for the tests to complete.

Additionally simplify post build result notifications.

Part of
#2642
@akurtakov
Copy link
Member

Is there still smth to be done or it can be closed?

@HannesWell
Copy link
Member Author

Is there still smth to be done or it can be closed?

The Centos-9 and OpenSUSE tests seem to be rock-stable now, but as mentioned the Linux-RISV tests have two failures (for both Java versions: 21 and 23).

  • org.eclipse.ui.tests.api.IEditorRegistryTest.testFindExternalEditor
java.lang.AssertionError: The OS should have at least one external editor
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.eclipse.ui.tests.api.IEditorRegistryTest.testFindExternalEditor(IEditorRegistryTest.java:154)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
  • org.eclipse.ui.tests.progress.ProgressViewTests.testItemOrder:
Wrong job order: arrays first differed at element [0]; expected:<1. User Job(3832)> but was:<2. High Priority Job(3833)>
	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:78)
	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:28)
	at org.junit.Assert.internalArrayEquals(Assert.java:534)
	at org.junit.Assert.assertArrayEquals(Assert.java:285)
	at org.eclipse.ui.tests.progress.ProgressViewTests.testItemOrder(ProgressViewTests.java:189)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
Caused by: java.lang.AssertionError: expected:<1. User Job(3832)> but was:<2. High Priority Job(3833)>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:120)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at org.junit.internal.ExactComparisonCriteria.assertElementsEqual(ExactComparisonCriteria.java:8)
	at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:76)
	... 5 more

The first failure looks like something is missing at the test-machine. @akurtakov can you give an advice how to solve that?
And the second failure doesn't seem to happen always. Maybe it's related to the first or the test should be a bit more flexible or it's something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants