Skip to content

Filter Pull Requests Server-Side with GitHub GraphQL API #22617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
devonboyer opened this issue Apr 9, 2025 · 4 comments · May be fixed by #22623
Open

Filter Pull Requests Server-Side with GitHub GraphQL API #22617

devonboyer opened this issue Apr 9, 2025 · 4 comments · May be fixed by #22623
Labels
enhancement New feature or request

Comments

@devonboyer
Copy link

devonboyer commented Apr 9, 2025

Summary

The page size for listing pull requests from GitHub is hardcoded which makes refreshing application sets which list pull requests from large repos very slow when there is thousands of open PRs.

Motivation

We use ArgoCD pull request generator at my company for deploying preview environments on GitHub PRs that we then use for integration tests. The repository we use is very large and has thousands of pull requests from many different teams at any given time.

For example, a repository with large pull request volume might have over 4000 open PRs at any one time. This currently would require 40 round trips in order to list all pull requests which is noticeably slow to the developer who is waiting for their preview environment.

sum(rate(controller_runtime_reconcile_time_seconds_sum{namespace=~"argocd-production", controller="applicationset"}[60m])) 
/
sum(rate(controller_runtime_reconcile_time_seconds_count{namespace=~"argocd-production", controller="applicationset"}[60m]))
Image

Proposal

Replace the REST API in the pull request GithubService with the GraphQL API in order to filter pull requests using the labels on the server-side which is significantly faster and less round trips in order for the pull request generator to be usable for large repos with high pull request volume.

var query struct {
	Search struct {
		Nodes []struct {
			PullRequest struct {
				Number      githubv4.Int
				Title       githubv4.String
				HeadRefName githubv4.String
				BaseRefName githubv4.String
				HeadRefOid  githubv4.String
				Labels      struct {
					Nodes []githubLabel
				} `graphql:"labels(first: 10)"`
				Author struct {
					Login githubv4.String
				}
			} `graphql:"... on PullRequest"`
		}
		PageInfo struct {
			EndCursor   githubv4.String
			HasNextPage bool
		}
	} `graphql:"search(query: $query, type: ISSUE, first: 100, after: $after)"`
}

queryString := fmt.Sprintf("repo:%s/%s is:pr is:open", g.owner, g.repo)
for _, label := range g.labels {
	queryString += fmt.Sprintf(" label:\"%s\"", label)
}
@devonboyer devonboyer added the enhancement New feature or request label Apr 9, 2025
@crenshaw-dev
Copy link
Member

Seems alright, but it is a bit of a bandaid. Is there no additional filter we could add to limit the size of the response?

@devonboyer
Copy link
Author

devonboyer commented Apr 9, 2025

Seems alright, but it is a bit of a bandaid. Is there no additional filter we could add to limit the size of the response?

For Github specifically, it is possible to use the Search issues and pull requests API to filter pull requests by the labels server-side which is very fast but this only returns the pull request numbers and would require N more API calls to get the necessary details (e.g. head sha) about each PR.

query := fmt.Sprintf("repo:%s/%s is:pr is:open", g.owner, g.repo)
for _, label := range g.labels {
	query += fmt.Sprintf(" label:\"%s\"", label)
}

opts := &github.SearchOptions{
	ListOptions: github.ListOptions{
		PerPage: 100,
	},
}
for {
	result, resp, err := g.client.Search.Issues(ctx, query, opts)
	...
}

For repos with large numbers of irrelevant PRs and a small number of relevant PRs this will be much faster but for smaller repos (which I assume are more common) then the existing code will be much faster. There seems to be no clear general solution.

Increasing page size is definitely a band aid but it might be the only reasonable thing that can be tuned with the REST API.

I will explore the GraphQL API instead.

@devonboyer
Copy link
Author

devonboyer commented Apr 9, 2025

I tested the GraphQL API in place of the REST API to list pull requests and it appears to be significantly faster with the following query using github.com/shurcooL/githubv4:

var query struct {
	Search struct {
		Nodes []struct {
			PullRequest struct {
				Number      githubv4.Int
				Title       githubv4.String
				HeadRefName githubv4.String
				BaseRefName githubv4.String
				HeadRefOid  githubv4.String
				Labels      struct {
					Nodes []githubLabel
				} `graphql:"labels(first: 10)"`
				Author struct {
					Login githubv4.String
				}
			} `graphql:"... on PullRequest"`
		}
		PageInfo struct {
			EndCursor   githubv4.String
			HasNextPage bool
		}
	} `graphql:"search(query: $query, type: ISSUE, first: 100, after: $after)"`
}

queryString := fmt.Sprintf("repo:%s/%s is:pr is:open", g.owner, g.repo)
for _, label := range g.labels {
	queryString += fmt.Sprintf(" label:\"%s\"", label)
}

I ran this against our largest repo which has over 4000 open PRs.

On my branch with GraphQL search query:

github_test.go:82: List function took 808.005791ms

On master:

    github_test.go:102: List function took 1m4.252771416s

I have a PR I can put up with this change and I am happy to test it internally at my company to validate it.

@devonboyer devonboyer changed the title Configurable Page Size for Pull Request Generator Filter Pull Requests Server-Side with GitHub GraphQL API Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants