Summary
Between the 21st March 2025 and the 14th May 2025 (commits 6016c14 and 8eb3567), the pgai repository was vulnerable to an attack allowing the exfiltration of all secrets used in one workflow. In particular, the GITHUB_TOKEN
with write
permissions for the repository, allowing an attacker to tamper with all aspects of the repository, including pushing arbitrary code and releases.
After conducting a comprehensive audit of all workflow executions and repository activity, we found no evidence that this vulnerability was exploited. The integrity of the pgai codebase remains intact.
Details
The .github/workflows/huggingface-dataset.yml
used the pull_request_target
event trigger, checked out the head.sha
of the pull request, and executed the scripts/generate_huggingface_dataset.py
python script. The pull_request_target
runs the workflow file from the base repository, but with full access to the repository's secrets, and with a GITHUB_TOKEN
with full read/write access to the repository.
This results in the execution of untrusted code in a pull request from a fork, and for the ability to exfiltrate the GITHUB_TOKEN
, and the HUGGINGFACE_HUB_TIMESCALE_TOKEN
secrets. Because the GITHUB_TOKEN
has write access to the repository, it can be used to modify many objects in the repository, posing a significant supply-chain risk, as it could be used to push arbitrary code to the repository.
Impact
The potential impact of this vulnerability is large. An attacker could have used it to:
- poison the pgai codebase
- publish malicious releases of pgai on GitHub
- publish malicious releases of pgai on PyPI
We have found no evidence of the exploitation of this vulnerability. The exploitation would be rooted in an action run of the huggingface-dataset.yml
workflow. An audit of all executions of the vulnerable workflow reveal that it was not exploited by a malicious party. The permissions of the GITHUB_TOKEN
would allow a malicious actor to delete a workflow run, a further audit of the GitHub Audit Log revealed that no workflow runs were deleted. We do not believe that the integrity of the pgai codebase has been compromised.
Timeline
The vulnerability was reported by a team of security researchers on the 14th May 2025. Within hours steps were taken to mitigate the vulnerability.
The vulnerability was disclosed on the 17th June 2025.
Mitigation
The .github/workflows/huggingface-dataset.yml
workflow was fixed to close the vulnerability in #742. The fixes include switching from pull_request_target
to pull_request
, and explicitly reducing the scope of the GITHUB_TOKEN
token to read only access. The GITHUB_TOKEN
expires within 24 hours of job execution, so it did not need to be rotated. The HUGGINGFACE_HUB_TIMESCALE_TOKEN
was rotated.
Credits
Kindly reported by @darryk10 @AlbertoPellitteri @loresuso
Summary
Between the 21st March 2025 and the 14th May 2025 (commits 6016c14 and 8eb3567), the pgai repository was vulnerable to an attack allowing the exfiltration of all secrets used in one workflow. In particular, the
GITHUB_TOKEN
withwrite
permissions for the repository, allowing an attacker to tamper with all aspects of the repository, including pushing arbitrary code and releases.After conducting a comprehensive audit of all workflow executions and repository activity, we found no evidence that this vulnerability was exploited. The integrity of the pgai codebase remains intact.
Details
The
.github/workflows/huggingface-dataset.yml
used thepull_request_target
event trigger, checked out thehead.sha
of the pull request, and executed thescripts/generate_huggingface_dataset.py
python script. Thepull_request_target
runs the workflow file from the base repository, but with full access to the repository's secrets, and with aGITHUB_TOKEN
with full read/write access to the repository.This results in the execution of untrusted code in a pull request from a fork, and for the ability to exfiltrate the
GITHUB_TOKEN
, and theHUGGINGFACE_HUB_TIMESCALE_TOKEN
secrets. Because theGITHUB_TOKEN
has write access to the repository, it can be used to modify many objects in the repository, posing a significant supply-chain risk, as it could be used to push arbitrary code to the repository.Impact
The potential impact of this vulnerability is large. An attacker could have used it to:
We have found no evidence of the exploitation of this vulnerability. The exploitation would be rooted in an action run of the
huggingface-dataset.yml
workflow. An audit of all executions of the vulnerable workflow reveal that it was not exploited by a malicious party. The permissions of theGITHUB_TOKEN
would allow a malicious actor to delete a workflow run, a further audit of the GitHub Audit Log revealed that no workflow runs were deleted. We do not believe that the integrity of the pgai codebase has been compromised.Timeline
The vulnerability was reported by a team of security researchers on the 14th May 2025. Within hours steps were taken to mitigate the vulnerability.
The vulnerability was disclosed on the 17th June 2025.
Mitigation
The
.github/workflows/huggingface-dataset.yml
workflow was fixed to close the vulnerability in #742. The fixes include switching frompull_request_target
topull_request
, and explicitly reducing the scope of theGITHUB_TOKEN
token to read only access. TheGITHUB_TOKEN
expires within 24 hours of job execution, so it did not need to be rotated. TheHUGGINGFACE_HUB_TIMESCALE_TOKEN
was rotated.Credits
Kindly reported by @darryk10 @AlbertoPellitteri @loresuso