Skip to content

Attempt to fix flaky Harbor E2E setup #20641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

NouemanKHAL
Copy link
Member

What does this PR do?

The Harbor E2E tests failed recently with the following error:

E           datadog_checks.dev.errors.RetryError: Result: None
E           Error: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /api/v2.0/users/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff4b4ff1f70>: Failed to establish a new connection: [Errno 111] Connection refused'))
E           Function: create_simple_user, Args: (), Kwargs: {}

This PR tries to address that issue by increasing the wait time giving more time for the harbor users endpoint to be healthy.

Motivation

Failing job: https://github.com/DataDog/integrations-core/actions/runs/16020744131/job/45196949202

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented Jul 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.04%. Comparing base (0cc4a1c) to head (f0ad2b8).

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
harbor 89.04% <ø> (-0.61%) ⬇️
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -49,8 +48,7 @@ def dd_environment(e2e_instance):
expected_log = "http server Running on" if HARBOR_VERSION < [1, 10, 0] else "API server is serving at"
conditions = [
CheckDockerLogs(compose_file, expected_log, wait=3),
lambda: time.sleep(4),
WaitFor(create_simple_user),
WaitFor(create_simple_user, wait=5),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to believe that one more second is enough? Just curious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class WaitFor(LazyFunction):
    def __init__(
        self,
        func,  # type: Callable
        attempts=60,  # type: int
        wait=1,  # type: int
        args=(),  # type: Tuple
        kwargs=None,  # type: Dict
    ):

This is actually increasing the waiting time by 4 seconds for every attempt (60 by default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh true, I checked how WaitFor works but missed that the sleep in there is in each loop. Seems a crazy increase though we will go from 64 to 240 seconds max haha, hopefully 4 minutes is enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it's pointless to increase it just a bit every time it fails, I'd rather wait as much possible cause we have no choice, if it still fails with this big timeout, then we have a different problem, we might need to consider switching to exponential retry backoff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants