Description
I am running MASTER.sh to download all data from the NREL github organization (which has 350 repos), but it's taking a very long time and I'm not sure if this is normal. For most repositories in the org, the query returns in under a second. It does appear that the script is scraping over 4,000 repositories (possibly dependencies?)
For some repositories, it seems to take much longer and the script prints out warning-like messages such as:
Sending REST query...
Checking response...
HTTP/1.1 202 Accepted
API Status {"limit": 5000, "remaining": 4414, "reset": 1607114323}
Query accepted but not yet processed. Trying again in 3sec...
Also, for a very small minority of repos, I get the following error-like message:
GraphQL API error.
[{"path": ["repository", "dependencyGraphManifests"], "locations": [{"line": 1, "column": 244}], "message": "loading"}]
These two errors do not seem to occur simultaneously.
The script is still humming along, and I will let it finish, but am wondering if these errors can simply be ignored.
Update: The script has finished and I am able to view the data using the Jekyll dev server. However, it appears that at least 3 repositories (out of 350) were skipped.
Steps to reproduce:
- Remove all data from explore/github_data.
- Remove all repos and orgs from _explore/input_lists.json, and add "NREL" as an org.
- Create python environment and install dependencies from requirements.txt
- Set GITHUB_API_TOKEN environment variable
- Run ./MASTER.sh