Skip to content

Conversation

@domoscargin
Copy link
Contributor

@domoscargin domoscargin commented Jan 3, 2025

Resolves #13, resolves #12, resolves #10, resolves #9

Example action: https://github.com/alphagov/design-system-github-stats/actions/runs/12678615737

And PR: #30

Changes

Adds vitest for testing

  • Mostly because I wanted to try vitest, and it seems nice and speedy and plays nice with ESM.
  • Tests are a bit patchy at the moment, but I've done a few basic ones

Abstracts Octokit

  • Adds an Octokit helper file

Adds a RepoData class

  • Simplifies the build script and makes it easier to unit test things
  • Responsible for running checks and gathering the repo data

Adds a Result class

  • Stores information on the repo
  • Set up as a class to allow normalisation of data for different outputs

Updates the script behaviour and return data

  • Now writes to file in batches, rather than each time we analysed a repo
  • Uses the RepoData class extensively
  • Uses a free GraphQL query to fetch initial information so the script speed is faster, and we're not close to getting to our rate limit
  • Now searches for ALL package.json files, in case they're not at root, and/or there's more than one of them
    • For lockfiles, we try root and the same directory as each of the package files
  • Returns a more detailed suite of dependency info
    • Will find and output all govuk-frontend versions which are DIRECT dependencies in directDependencies
    • Will find and output all govuk-frontend versions which are INDIRECT dependencies in indirectDependencies

Dead ends

I did a lot of thinking about how to reduce the script time and API calls. The free GraphQL request for created at, pushed at and latest commit I've implemented is the biggest enhancement, and batch writing to files reduces I/O a lot. But there were some dead ends:

  • I investigated using the dependency graph API, but it's a bit underbaked at the moment, and there's no easy way to get the full dependency tree for dependents. I think GitHub are working on this API though, so there may be methods in the future that allow us to skip most of our script.
  • I also looked at the search API as an easier way to easily find govuk-frontend dependencies in relevant files, but the rate limit is severely restrictive (10 calls per minute for code search). Could potentially be more useable if we're doing caching, but probably not a goer.
  • I tried to get the repo tree during the initial GraphQL query. Getting the root file list is a very light cost, but there's no easy way to recurse through the file structure. Whereas the REST API just takes 1 call to give you the full structure.

@domoscargin domoscargin marked this pull request as ready for review January 3, 2025 11:20
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 11 times, most recently from 4d1d55e to 5c95154 Compare January 5, 2025 21:39
@domoscargin
Copy link
Contributor Author

domoscargin commented Jan 6, 2025

🐛 BUG

Parent dependencies: these are all null in the latest dataset. Suspect this is because we send the "this is an indirect dependency" examples straight to finding a lockfile version, which will generally be successful and find the govuk-frontend dependency at top level, rather than digging in the sub-levels. Probably makes sense to just fold the parent dependency in with this data, to be honest, rather than have a special path just for "indirect" dependencies.

SOLVED: 47fec86

@domoscargin domoscargin force-pushed the bk-repo-data-class branch 5 times, most recently from 0871587 to 7aeda71 Compare January 6, 2025 21:52
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 2 times, most recently from e6009c5 to 11c43e4 Compare January 9, 2025 16:22
@domoscargin domoscargin marked this pull request as draft January 9, 2025 19:48
standardjs is having some governance problems. Neostandard is a fork which is compatible with eslint 9.

At least some contributors at standardjs have indicated they want to fold the changes back in when the governance issue is sorted.
@domoscargin domoscargin changed the base branch from process-all-data to main January 13, 2025 00:37
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 13 times, most recently from 81d9fe7 to 93bb0e0 Compare January 16, 2025 17:18
@domoscargin domoscargin changed the base branch from bk-linting to main January 16, 2025 17:26
@domoscargin domoscargin marked this pull request as ready for review January 16, 2025 17:26
@domoscargin domoscargin mentioned this pull request Feb 1, 2025
16 tasks
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 2 times, most recently from b1d5064 to 1218591 Compare February 5, 2025 20:18
- strict equality
- use 'with' instead of 'assert' for import assertions
- No more dangling commas
Adds helpers/octokit.mjs for queries to the GitHub API.

getRepo() gets some basic repo data
getTree() gets the repo file tree
getFileContent() gets the text content of a file
getRemainingRateLimit() gets the remaining REST API rate limit
Rather hastily done, and we're not testing the Octokit class just yet, but probably better than nothing
Stores and emits the result. Could be used for validation and normalisation of result data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write tests Minimise I/O operations Minimise rate limiting Handle multiple package.json files

2 participants