Skip to content

Secrets exfiltration via `pull_request_target`

Critical
alejandrodnm published GHSA-89qq-hgvp-x37m Jun 17, 2025

Package

actions .github/workflows/huggingface-dataset.yml (GitHub Actions)

Affected versions

736a10424cd1ee80bb3c07f5ca7476a2b6be6473

Patched versions

8eb356729c33560ce54b88b9a956960ad1e3ede8

Description

Summary

Between the 21st March 2025 and the 14th May 2025 (commits 6016c14 and 8eb3567), the pgai repository was vulnerable to an attack allowing the exfiltration of all secrets used in one workflow. In particular, the GITHUB_TOKEN with write permissions for the repository, allowing an attacker to tamper with all aspects of the repository, including pushing arbitrary code and releases.

After conducting a comprehensive audit of all workflow executions and repository activity, we found no evidence that this vulnerability was exploited. The integrity of the pgai codebase remains intact.

Details

The .github/workflows/huggingface-dataset.yml used the pull_request_target event trigger, checked out the head.sha of the pull request, and executed the scripts/generate_huggingface_dataset.py python script. The pull_request_target runs the workflow file from the base repository, but with full access to the repository's secrets, and with a GITHUB_TOKEN with full read/write access to the repository.

This results in the execution of untrusted code in a pull request from a fork, and for the ability to exfiltrate the GITHUB_TOKEN, and the HUGGINGFACE_HUB_TIMESCALE_TOKEN secrets. Because the GITHUB_TOKEN has write access to the repository, it can be used to modify many objects in the repository, posing a significant supply-chain risk, as it could be used to push arbitrary code to the repository.

Impact

The potential impact of this vulnerability is large. An attacker could have used it to:

  • poison the pgai codebase
  • publish malicious releases of pgai on GitHub
  • publish malicious releases of pgai on PyPI

We have found no evidence of the exploitation of this vulnerability. The exploitation would be rooted in an action run of the huggingface-dataset.yml workflow. An audit of all executions of the vulnerable workflow reveal that it was not exploited by a malicious party. The permissions of the GITHUB_TOKEN would allow a malicious actor to delete a workflow run, a further audit of the GitHub Audit Log revealed that no workflow runs were deleted. We do not believe that the integrity of the pgai codebase has been compromised.

Timeline

The vulnerability was reported by a team of security researchers on the 14th May 2025. Within hours steps were taken to mitigate the vulnerability.

The vulnerability was disclosed on the 17th June 2025.

Mitigation

The .github/workflows/huggingface-dataset.yml workflow was fixed to close the vulnerability in #742. The fixes include switching from pull_request_target to pull_request, and explicitly reducing the scope of the GITHUB_TOKEN token to read only access. The GITHUB_TOKEN expires within 24 hours of job execution, so it did not need to be rotated. The HUGGINGFACE_HUB_TIMESCALE_TOKEN was rotated.

Credits

Kindly reported by @darryk10 @AlbertoPellitteri @loresuso

Severity

Critical

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N

CVE ID

CVE-2025-52467

Weaknesses

No CWEs

Credits