Named Entity Recognition for arxiv papers (NERxiv) is a Python wrapper tool for extracting structured metadata from scientific papers on arXiv using LLMs and modern retrieval-augmented generation (RAG) techniques.
Visit the documentation page to learn how to use this tool.
- Uses
pyrxivto fetch, download, and extract text from arXiv papers - Chunks and embeds text with SentenceTransformers or LangChain to categorize papers content using local LLMs (via Ollama)
- Includes CLI tools and notebook tutorials for reproducible workflows
Install the core package:
pip install nerxivWe recommend running your own models locally using Ollama:
# Install Ollama (follow instructions on their website)
ollama pull <model-name> # e.g., llama3, deepseek-r1, qwen3:30b
# Start the local server
ollama serveTo contribute to NERxiv or run it locally, follow these steps:
git clone https://github.com/JosePizarro3/NERxiv.git
cd NERxivWe recommend Python ≥ 3.10:
python3 -m venv .venv
source .venv/bin/activateUse uv (faster than pip) to install the package in editable mode with dev and docu extras:
pip install --upgrade pip
pip install uv
uv pip install -e .[dev,docu]Use pytest with verbosity to run all tests:
python -m pytest -sv testsTo check code coverage:
python -m pytest --cov=nerxiv testsWe use Ruff for formatting and linting (configured via pyproject.toml).
Check linting issues:
ruff check .Auto-format code:
ruff format . --checkManually fix anything Ruff cannot handle automatically.
To view the documentation locally, make sure to have installed the extra [docu] packages:
uv pip install -e '[docu]'Note: This command installs mkdocs, mkdocs-material, and other documentation-related dependencies.
The first time, build the server:
mkdocs buildRun the documentation server:
mkdocs serveThe output looks like:
INFO - Building documentation...
INFO - Cleaning site directory
INFO - [14:07:47] Watching paths for changes: 'docs', 'mkdocs.yml'
INFO - [14:07:47] Serving on http://127.0.0.1:8000/Simply click on http://127.0.0.1:8000/. The changes in the md files of the documentation are immediately reflected when the files are saved (the local web will automatically refresh).