LlamaIndex vulnerability in ArxivReader class can cause MD5 hash collisions
Moderate severity
GitHub Reviewed
Published
Jul 7, 2025
to the GitHub Advisory Database
•
Updated Jul 8, 2025
Description
Published by the National Vulnerability Database
Jul 7, 2025
Published to the GitHub Advisory Database
Jul 7, 2025
Last updated
Jul 8, 2025
Reviewed
Jul 8, 2025
A vulnerability in the ArxivReader class of the run-llama/llama_index repository allows for MD5 hash collisions when generating filenames for downloaded papers. This can lead to data loss as papers with identical titles but different contents may overwrite each other, preventing some papers from being processed for AI model training. The issue is resolved in llama-index-readers-papers version 0.3.1 (in llama-index 0.12.28).
References