Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Detection bot is too spammy #6044

Open
malfet opened this issue Dec 11, 2024 · 8 comments
Open

Flaky Detection bot is too spammy #6044

malfet opened this issue Dec 11, 2024 · 8 comments
Assignees

Comments

@malfet
Copy link
Contributor

malfet commented Dec 11, 2024

As I can't convey it with links, I'll post some pictures (But looks like all issues from pytorch/pytorch#142936 to pytorch/pytorch#143037 are created by bot)
image
image

@malfet
Copy link
Contributor Author

malfet commented Dec 11, 2024

Good context from @eellison pytorch/pytorch#92555

@malfet
Copy link
Contributor Author

malfet commented Dec 11, 2024

Why should we parse the test name from the title rather than a body? I.e. let's move DISABLE XYZ messages to body to allow mass disables in one issue?

@clee2000
Copy link
Contributor

Currently working on making some sort of aggregate

I imagine its in the title to make it easier to search for, but we can just put something on HUD that lets people search

@malfet
Copy link
Contributor Author

malfet commented Dec 11, 2024

I imagine its in the title to make it easier to search for, but we can just put something on HUD that lets people search

I think we can split it 2 steps:

  • Allow disable bot to read info about disabled tests from both title and body
  • Modify flaky issues creation to aggregate if it wants to create 50+ issue into an aggregate (that later can be split manually)

@huydhn
Copy link
Contributor

huydhn commented Dec 12, 2024

A q: do we even need to have an aggregated issue if disabled tests are coming from the bot? There is already https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json that is what CI uses here. Does it make sense to have the following setup?

@clee2000
Copy link
Contributor

A q: do we even need to have an aggregated issue if disabled tests are coming from the bot? There is already https://github.com/pytorch/test-infra/blob/generated-stats/stats/disabled-tests-condensed.json that is what CI uses here. Does it make sense to have the following setup?

I'm worried that disabling tests by putting them directly in the json is too quiet, and we'd slowly accumulate a ton of tests there. While issues are noisy, they are much easier to see

@huydhn
Copy link
Contributor

huydhn commented Dec 12, 2024

I'm worried that disabling tests by putting them directly in the json is too quiet, and we'd slowly accumulate a ton of tests there. While issues are noisy, they are much easier to see

True, that's a good point. On the other hand, if we have guardrail in place to avoid the bot from disable tests too quickly and use the number of disabled tests on HUD metrics place as a monitoring mechanism, it might be ok.

Also it seems useful to know how many tests are disabled by the bot over the total number of disabled tests on HUD metrics

@ZainRizvi
Copy link
Contributor

If we disable tests in group, it would be good to group similar tests together, e.g. those with the same prefix, and put that prefix name in the issue title as well.

That way regular flaky test issues that pop up don't loose the attention they deserve because of being sucked into the vortex of a big group disable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Prioritized
Development

No branches or pull requests

4 participants