Skip to content

Invalid entries in multi-member ID lists cause entry repetition #82

Open
@lukasschwab

Description

@lukasschwab

Description

A clear and concise description of what the bug is.

If id_list consists of a single nonexistent––but valid––ID, arXiv returns an empty feed which is interpreted to mean "no results."

If id_list consists of both existent and nonexistent valid IDs (["0000.0000", "1707.08567"]), the feed is non-empty––it contains a single item––but it has feed.feed.opensearch_totalresults == 2. The client takes this to be a partial page, and requests a page with offset 1... which lists paper 1707.08567 again. This is an API bug.

Notably, this behavior differs depending on the nonexistent ID. Nonexistent ID 1507.58567 yields an entry with missing fields (covered in #80, fixed by #82), whereas 1407.58567 yields no entries at all (covered here).

Example: https://export.arxiv.org/api/query?id_list=1407.58567,1707.08567

Steps to reproduce

Steps to reproduce the behavior; ideally, include a code snippet.

def test_invalid_id(self):
        results = list(arxiv.Search(id_list=["0000.0000"]).results())
        self.assertEqual(len(results), 0)
        results = list(arxiv.Search(id_list=["0000.0000", "1707.08567"]).results())
        print(len(results))
        self.assertEqual(len(results), 1) # Fails: 1707.08567 appears twice.

Expected behavior

A clear and concise description of what you expected to happen.

Results should not be duplicated.

Searching for ["0000.0000", "1707.08567"] should yield a single result.

Versions

  • python version: 3.7.9
  • arxiv.py version: 1.4.1

Metadata

Metadata

Assignees

Labels

apiIssues that correspond to arXiv API behavior rather than behavior introduced by this wrapper.bugDeviations from documented behavior.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions