ArXplorer

Recommender of daily papers from arXiv, customized with your Prompt. Minimal, hackable, no-boilerplate.

"I like innovative papers in large foundation models, multimodal methods, symbolic reasoning and automation."

What's up

Now we've been overwhelmed by papers on arXiv. With ~300 new additions daily in cs.AI section alone, sifting through them can be daunting. This project scrapes daily feed from https://arxiv.org/list/{namespace}/new, collecting author data and performing two-stage ranking:

Coarse Ranking: Use the authors' impact index and a CPU-friendly embedding model (per MTEB leaderboard 🤗) to reduce candidate pools into ~20 by weighted Copeland ranking.
Reranking: Optionally use gpt-4 to choose top k and write a summary (which is cheap for just one call per day).

Quick Start

Prepare environment

conda create -n "arxplorer" python==3.11
conda activate arxplorer
pip install -r requirements.txt

(Recommended) Use an OpenAI key for summarization and better ranking.

echo 'OPENAI_API_KEY=your_api_key_here' >> .env

GO!

python run.py

Customization

You may customize your preferences or interests by

echo 'INSTRUCTION="I like ..."' >> .env

Use namespace to specify the section in arXiv to scrape from (make sure https://arxiv.org/list/{namespace}/new can be visited). Use top_k to specify the final number of feeds you want to see. coarse_k is the intermediate number from coarse ranking and should always be larger than top_k.

python run.py --namespace="cs.AI" --top_k=10 --coarse_k=20

fast_mode is set to True by default, which ignores author-related features. Collecting author data stably (using scholarly and free-proxy can be painfully slow at the beginning (and going faster as authors_cache.db builds up the cache). If you are deploying on server or have ~1hr to let it run,

python run.py --fast_mode=False

Disclaimer

This ranker is soooo biased and I'm pretty sure some cool papers are overlooked. But I feel it helpful in capturing part of which I regret to miss.

Next Step

I'll create a Tweeter Bot soon to serve this project into daily feed. Feel free to contact me magician1206(Discord) for suggestions or contribute to more features, faster pipelines etc :)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
arxplorer		arxplorer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ArXplorer

What's up

Quick Start

Customization

Disclaimer

Next Step

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

checkmate17/Daily-paper-using-OpenAI

Folders and files

Latest commit

History

Repository files navigation

ArXplorer

What's up

Quick Start

Customization

Disclaimer

Next Step

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages