Skip to content

Pagination / offset of results for the SPARQL endpoint is faulty #150

@dhimmel

Description

@dhimmel

Thanks for making the MeSH RDF SPARQL API. It's been convenient for quick access to MeSH.

I'd like to do a query that returns over 1000 results, and therefore need to figure out how to use pagination with the SPARQL API at https://id.nlm.nih.gov/mesh/sparql. Here's my query to return a table of descriptors:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT *
FROM <http://id.nlm.nih.gov/mesh/2020>
WHERE {
  ?mesh_uri a meshv:Descriptor .
  ?mesh_uri meshv:identifier ?mesh_id.
  ?mesh_uri rdfs:label ?mesh_label .
} 
ORDER BY ?mesh_uri

But I'm having trouble incrementing limit and offset to retrieve all results.

In search of a more reproducible example, I've simplified it to the this API call, generated by this python code:

import requests

query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
SELECT *
FROM <http://id.nlm.nih.gov/mesh/2020>
WHERE {
  ?mesh_uri a meshv:Descriptor .
  ?mesh_uri meshv:identifier ?mesh_id.
  ?mesh_uri rdfs:label ?mesh_label .
} 
ORDER BY ?mesh_uri
LIMIT 5
"""

params = {
    "query": query,
    "format": "json",
    "inference": True,
    "limit": 10,
    "offset": 4,
    "year": 2020,
}
api_url = "https://id.nlm.nih.gov/mesh/sparql"
response = requests.get(api_url, params)
print(response.url)
len(response.json()["results"]["bindings"])

The expected result is to receive a single record (the 5th record), because the query should return 5 records, and the offset is 4. Instead, 5 records are returned. The returned records under results.bindings start with:

      {
        "mesh_uri": { "type": "uri" , "value": "http://id.nlm.nih.gov/mesh/2020/D000005" } ,
        "mesh_id": { "type": "literal" , "value": "D000005" } ,
        "mesh_label": { "type": "literal" , "xml:lang": "en" , "value": "Abdomen" }
      } ,

So it looks like offset was respected, but something about the SPARQL LIMIT 5 or API parameter limit=10 does not work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions