You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-14Lines changed: 30 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,29 +8,45 @@ This repo contains a list of repos that we:
8
8
9
9
1.**think** are government services
10
10
2.**know** are using GOV.UK Frontend directly
11
+
3.**think** are using GOV.UK Frontend indirectly
11
12
12
13
We specifically collect and store the following from our list of repos:
13
14
14
15
- The name of the repo
15
16
- The owner of the repo ie: the github user or org
16
-
- The version of GOV.UK Frontend being used, as best we are able to work out
17
+
- The versions of GOV.UK Frontend being used, as best as we can work out
18
+
- The parent dependencies of GOV.UK Frontend, if applicable
19
+
- When the repo was created
20
+
- When the repo was last updated
17
21
18
-
We collect the above data by filtering [GOV.UK Frontend's raw dependents data](https://github.com/alphagov/govuk-frontend/network/dependents), which we collect using the [github-dependents-info package](https://github.com/nvuillam/github-dependents-info). Whilst we filter this data, we still store repos rejected from filtering of the raw data to assess the accuracy of our filtering.
22
+
We collect the above data by filtering [GOV.UK Frontend's raw dependents data](https://github.com/alphagov/govuk-frontend/network/dependents), which we collect using the [github-dependents-info package](https://github.com/nvuillam/github-dependents-info).
19
23
20
24
> [!NOTE]
21
25
> We **do not** use this information to calculate or collect PII (Personally Identifiable Information) from individual github user accounts
22
26
23
27
## How it works
24
28
25
-
We filter and analyse our dependents data by looping through it and doing the following per loop:
26
-
27
-
1. We cross reference the repo against an owner list of github owners we know operate government services and a words list to flag repo names that don't look like services eg: "form-prototype", "book-driving-test-beta", "apply-to-vote-tech-demo" etc. This is cross referenced against an allow and deny list which bypass name filtering.
28
-
2. We get the repo's latest file tree and look for:
29
-
-`lib/usage_stats.js`, which indicates that the repo is an old instance of the GOV.UK Prototype Kit and is therefore unlikely to be a live service
30
-
- a `package.json` file. The absence of this indicates that the service can't be using GOV.UK Frontend as a direct dependency
31
-
- a lockfile. We only look for either `package-lock.json` or `yarn.lock`
32
-
3. We look for the following in `package.json`:
33
-
- A dependency on `govuk-frontend`, which we then read the version from
34
-
- A dependency on `govuk-prototype-kit`. This is evidence that the repo is a new instance of the Prototype Kit which again indicates that it can't be a service
35
-
4. If the version of GOV.UK Frontend we retrieved is using [semver approximation syntax](https://github.com/npm/node-semver#versions) eg "^4.7.0" or "~5.1.0" then we check the lockfile and attempt to ascertain the actual version
36
-
5. We build our data and write it to `data/filtered-data.json`
29
+
### The RepoData Class
30
+
The `RepoData` class manipulates and stores data related to repos. See its JSDoc.
31
+
32
+
We analyse our dependents by looping through them in `build-filtered-data.mjs` and doing the following per loop:
33
+
34
+
1. Create a new RepoData instance. This is used to check and fetch data from the repo.
35
+
2. Create a new Result instance. This is used to store and emit the result.
36
+
2. We check whether the repo is on our list of repos to ignore and skip the rest of the analysis if so.
37
+
3. We run a query on GitHub's GraphQL API to retrieve:
38
+
- when the repo was created
39
+
- the time of the latest commit to the repo
40
+
- the SHA of the latest commit
41
+
4. We fetch the repo's file tree using the latest commit SHA
42
+
5. We retrieve the contents of all package.json files we can find
43
+
6. We check whether we think the repo is an instance of the GOV.UK Prototype Kit (and unlikely to be a live service) by looking for:
44
+
- the `lib/usage_stats.js` file, which indicates that the repo is an old instance of the GOV.UK Prototype Kit, OR
45
+
- a dependency on the GOV.UK Prototype in the package.json files
46
+
7. We look for all direct dependencies of `govuk-frontend` across the package.json files
47
+
8. If the version of GOV.UK Frontend we retrieved is using [semver approximation syntax](https://github.com/npm/node-semver#versions) eg "^4.7.0" or "~5.1.0", then we check the lockfile and attempt to ascertain the actual version.
48
+
9. If we don't find any direct dependencies, we search the lockfile for any indirect dependencies. We currently support package-lock.json or yarn.lock files.
49
+
9. We save the result of the repo analysis to our results
50
+
10. We save the results to a dated file ending `filtered-data.json`
51
+
11. If an error is thrown at any point during analysis, we log that to the result.
0 commit comments