feat: add go flaky tests github action #1013

dimitarvdimitrov · 2025-06-05T18:36:38Z

Add core functionality for analyzing test failures from Loki logs.

This PR introduces the foundation of the flaky test detection bot. It is broken up from #998. This is the first PR of 3. The next PRs are dimitarvdimitrov#1 and dimitarvdimitrov#2. They add Git author tracking and GitHub issue management.

Key Features

Loki Integration: Query Loki using LogQL to extract test failures from CI/CD pipelines
Failure Analysis: Parse and aggregate test failures across branches and time periods
Flaky Test Classification: Identify tests that fail on main branch or across multiple branches
Configurable Analysis: Support for time ranges (24h, 7d, etc.) and result limits
Detailed Reporting: Generate JSON reports with failure counts and workflow URLs

Why in `grafana/shared-workflows`

This action is designed to work with any Go repository that meets these minimal requirements:

Runs unit tests in GitHub Actions
Logs are stored in loki-ops with the cicd-o11y namespace

This covers virtually all Grafana Labs repositories today. My plan is to deploy this immediately for grafana/mimir and grafana/backend-enterprise.

Future Extensibility

For non-Grafana repos or different Loki setups, the LogQL query can be made configurable via an additional input parameter, enabling adoption across any organization using Loki for CI/CD observability. Let me know if you think we should go down this route.

The action requires no repository-specific configuration - just provide Loki credentials and the repository name and have the repository closed locally. It automatically discovers test files, identifies recent contributors, and manages GitHub issues.

Testing

Includes a test suite with real Loki response data and golden file testing for reliable parsing of complex log structures.

Add core functionality for analyzing test failures from Loki logs: - Query Loki using LogQL for test failure data - Parse and aggregate test failures across branches - Classify tests as flaky based on failure patterns - Generate detailed analysis reports with failure counts and workflow URLs - Configurable time ranges and result limits This initial implementation includes stub interfaces for Git and GitHub functionality that will be added in subsequent PRs.

Remove stub implementations and complexity that will be added in later PRs: - Remove GitClient and GitHubClient interfaces and stubs - Remove FilePath and RecentCommits from FlakyTest struct - Remove repository-directory and skip-posting-issues inputs - Remove author tracking and issue management code paths - Focus purely on Loki querying, log parsing, and flaky test detection This creates a clean foundation for PR2 (Git authors) and PR3 (GitHub issues) to build upon without unnecessary complexity in the initial implementation.

- Add MockLokiClient and MockFileSystem for testing - Test core AnalyzeFailures() functionality with valid Loki response - Test error handling when Loki client fails - Test ActionReport() method for both empty and populated reports - Test utility functions like generateSummary() and FlakyTest.String() - Use proper Loki response format with stream metadata for test data

- Added README.md with detailed usage instructions and how-it-works - Added CHANGELOG.md documenting features and implementation details - Added run-local.sh script for local development and testing - Documentation now matches original PR functionality

- Remove GitHub issue creation and management features - Remove dry run mode references - Remove GitHub CLI and issue template content - Documentation now covers Loki analysis and Git author tracking

- Remove Git history analysis and author tracking features - Remove repository-directory input reference - Remove run-local.sh script usage, use go run directly - Documentation now covers only basic Loki analysis functionality

- Update aggregate.go to consider both 'main' and 'master' branches as indicators of flaky tests - Update README documentation to reflect main/master branch logic - Add comprehensive tests for master branch detection - Remove 'progressive PR structure' and 'technical details' from changelog

- Update action name and description - Rename directories and update all file references - Update module name and build paths - Update documentation and examples

The repository-directory parameter is not needed in the core action and was causing documentation inconsistency.

The github-token parameter is not used in the current implementation and was causing documentation inconsistency.

- Fix prettier formatting for action.yaml, README.md, CHANGELOG.md - Add missing newlines at end of files - Remove test output file that shouldn't be committed

dimitarvdimitrov · 2025-06-05T19:36:11Z

A linter is failing and I don't quite understand why. Can a maintainer give me a hand?

dsotirakis · 2025-06-06T14:33:41Z

A linter is failing and I don't quite understand why. Can a maintainer give me a hand?

#1014

dimitarvdimitrov added 14 commits June 5, 2025 15:42

Remove test files

070faba

Format Go code with gofmt

3e9224c

Remove PR3-related features from documentation

97c3804

- Remove GitHub issue creation and management features - Remove dry run mode references - Remove GitHub CLI and issue template content - Documentation now covers Loki analysis and Git author tracking

Remove PR2-related features from documentation

9a29cc3

- Remove Git history analysis and author tracking features - Remove repository-directory input reference - Remove run-local.sh script usage, use go run directly - Documentation now covers only basic Loki analysis functionality

Update to new loki structure

98d3dc7

Add .gitignore

87a1182

Use tests from before

81bcf3b

Remove github client

13e2d1e

Rename action from analyze-test-failures to go-flaky-tests

7f53342

- Update action name and description - Rename directories and update all file references - Update module name and build paths - Update documentation and examples

dimitarvdimitrov requested a review from a team as a code owner June 5, 2025 18:36

dimitarvdimitrov added 2 commits June 5, 2025 20:52

Remove repository-directory parameter from local script

085c861

The repository-directory parameter is not needed in the core action and was causing documentation inconsistency.

Remove github-token parameter from local script

d660967

The github-token parameter is not used in the current implementation and was causing documentation inconsistency.

dimitarvdimitrov mentioned this pull request Jun 5, 2025

feat: add git history analysis to go flaky tests dimitarvdimitrov/shared-workflows#1

Open

dimitarvdimitrov added 2 commits June 5, 2025 21:15

Fix linting issues

6b4813a

- Fix prettier formatting for action.yaml, README.md, CHANGELOG.md - Add missing newlines at end of files - Remove test output file that shouldn't be committed

Remove _actual.json files

60c2345

dimitarvdimitrov mentioned this pull request Jun 5, 2025

feat: add new analyze-test-failures github action #998

Closed

dimitarvdimitrov added 2 commits June 5, 2025 21:28

Run prettier again (?)

5291d10

Remove test files

67cd2b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add go flaky tests github action #1013

feat: add go flaky tests github action #1013

Uh oh!

dimitarvdimitrov commented Jun 5, 2025 •

edited

Loading

Uh oh!

dimitarvdimitrov commented Jun 5, 2025

Uh oh!

dsotirakis commented Jun 6, 2025

Uh oh!

Uh oh!

feat: add go flaky tests github action #1013

Are you sure you want to change the base?

feat: add go flaky tests github action #1013

Uh oh!

Conversation

dimitarvdimitrov commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Why in grafana/shared-workflows

Future Extensibility

Testing

Uh oh!

dimitarvdimitrov commented Jun 5, 2025

Uh oh!

dsotirakis commented Jun 6, 2025

Uh oh!

Uh oh!

dimitarvdimitrov commented Jun 5, 2025 •

edited

Loading

Why in `grafana/shared-workflows`