Skip to content

Commit 93bb0e0

Browse files
committed
Update README.md
1 parent 8f8bad6 commit 93bb0e0

File tree

1 file changed

+30
-14
lines changed

1 file changed

+30
-14
lines changed

README.md

Lines changed: 30 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,29 +8,45 @@ This repo contains a list of repos that we:
88

99
1. **think** are government services
1010
2. **know** are using GOV.UK Frontend directly
11+
3. **think** are using GOV.UK Frontend indirectly
1112

1213
We specifically collect and store the following from our list of repos:
1314

1415
- The name of the repo
1516
- The owner of the repo ie: the github user or org
16-
- The version of GOV.UK Frontend being used, as best we are able to work out
17+
- The versions of GOV.UK Frontend being used, as best as we can work out
18+
- The parent dependencies of GOV.UK Frontend, if applicable
19+
- When the repo was created
20+
- When the repo was last updated
1721

18-
We collect the above data by filtering [GOV.UK Frontend's raw dependents data](https://github.com/alphagov/govuk-frontend/network/dependents), which we collect using the [github-dependents-info package](https://github.com/nvuillam/github-dependents-info). Whilst we filter this data, we still store repos rejected from filtering of the raw data to assess the accuracy of our filtering.
22+
We collect the above data by filtering [GOV.UK Frontend's raw dependents data](https://github.com/alphagov/govuk-frontend/network/dependents), which we collect using the [github-dependents-info package](https://github.com/nvuillam/github-dependents-info).
1923

2024
> [!NOTE]
2125
> We **do not** use this information to calculate or collect PII (Personally Identifiable Information) from individual github user accounts
2226
2327
## How it works
2428

25-
We filter and analyse our dependents data by looping through it and doing the following per loop:
26-
27-
1. We cross reference the repo against an owner list of github owners we know operate government services and a words list to flag repo names that don't look like services eg: "form-prototype", "book-driving-test-beta", "apply-to-vote-tech-demo" etc. This is cross referenced against an allow and deny list which bypass name filtering.
28-
2. We get the repo's latest file tree and look for:
29-
- `lib/usage_stats.js`, which indicates that the repo is an old instance of the GOV.UK Prototype Kit and is therefore unlikely to be a live service
30-
- a `package.json` file. The absence of this indicates that the service can't be using GOV.UK Frontend as a direct dependency
31-
- a lockfile. We only look for either `package-lock.json` or `yarn.lock`
32-
3. We look for the following in `package.json`:
33-
- A dependency on `govuk-frontend`, which we then read the version from
34-
- A dependency on `govuk-prototype-kit`. This is evidence that the repo is a new instance of the Prototype Kit which again indicates that it can't be a service
35-
4. If the version of GOV.UK Frontend we retrieved is using [semver approximation syntax](https://github.com/npm/node-semver#versions) eg "^4.7.0" or "~5.1.0" then we check the lockfile and attempt to ascertain the actual version
36-
5. We build our data and write it to `data/filtered-data.json`
29+
### The RepoData Class
30+
The `RepoData` class manipulates and stores data related to repos. See its JSDoc.
31+
32+
We analyse our dependents by looping through them in `build-filtered-data.mjs` and doing the following per loop:
33+
34+
1. Create a new RepoData instance. This is used to check and fetch data from the repo.
35+
2. Create a new Result instance. This is used to store and emit the result.
36+
2. We check whether the repo is on our list of repos to ignore and skip the rest of the analysis if so.
37+
3. We run a query on GitHub's GraphQL API to retrieve:
38+
- when the repo was created
39+
- the time of the latest commit to the repo
40+
- the SHA of the latest commit
41+
4. We fetch the repo's file tree using the latest commit SHA
42+
5. We retrieve the contents of all package.json files we can find
43+
6. We check whether we think the repo is an instance of the GOV.UK Prototype Kit (and unlikely to be a live service) by looking for:
44+
- the `lib/usage_stats.js` file, which indicates that the repo is an old instance of the GOV.UK Prototype Kit, OR
45+
- a dependency on the GOV.UK Prototype in the package.json files
46+
7. We look for all direct dependencies of `govuk-frontend` across the package.json files
47+
8. If the version of GOV.UK Frontend we retrieved is using [semver approximation syntax](https://github.com/npm/node-semver#versions) eg "^4.7.0" or "~5.1.0", then we check the lockfile and attempt to ascertain the actual version.
48+
9. If we don't find any direct dependencies, we search the lockfile for any indirect dependencies. We currently support package-lock.json or yarn.lock files.
49+
9. We save the result of the repo analysis to our results
50+
10. We save the results to a dated file ending `filtered-data.json`
51+
11. If an error is thrown at any point during analysis, we log that to the result.
52+

0 commit comments

Comments
 (0)