pyrxiv is a Python package for retrieving arXiv papers, storing their metadata in pydantic-like classes, and optionally filtering some of them out based on the specific content of the papers (matching a regex pattern).
While originally developed for the Strongly Correlated Electron Systems community in Condensed Matter Physics (cond-mat.str-el), it's designed to be flexible and applicable to any arXiv category.
Install the core package:
pip install pyrxivpyrxiv main objective is to provide an easy command line interface (CLI) to search and download arXiv papers which contain a specific content string matched against a regex pattern. By default, the arXiv PDFs are downloaded. You can optionally save metadata to HDF5 files. You can use the CLI and print the options after installing the package using:
pyrxiv --helpor directly:
pyrxiv search_and_download --helpFor example, to download PDFs:
pyrxiv search_and_download --category cond-mat.str-el --regex-pattern "DMFT|Hubbard" --n-papers 5Or to also save metadata to HDF5 files:
pyrxiv search_and_download --category cond-mat.str-el --regex-pattern "DMFT|Hubbard" --n-papers 5 --save-hdf5Note: When using --regex-pattern, the tool will continue fetching papers from arXiv until it finds the specified number of papers (--n-papers) that match the pattern. Papers that don't match the regex are automatically discarded.
For a comprehensive guide on how to use the CLI and recommended pipelines, see the How to Use pyrxiv documentation.
To contribute to pyrxiv or run it locally, follow these steps:
git clone https://github.com/JosePizarro3/pyrxiv.git
cd pyrxivWe recommend Python ≥ 3.10:
python3 -m venv .venv
source .venv/bin/activateUse uv (faster than pip) to install the package in editable mode with dev extras:
pip install --upgrade pip
pip install uv
uv pip install -e .[dev]Use pytest with verbosity to run all tests:
python -m pytest -sv testsTo check code coverage:
python -m pytest --cov=pyrxiv testsWe use Ruff for formatting and linting (configured via pyproject.toml).
Check linting issues:
ruff check .Auto-format code:
ruff format .Manually fix anything Ruff cannot handle automatically.