-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paging improvements #23
Comments
Review this from WFS 2.0: http://docs.opengeospatial.org/is/09-025r2/09-025r2.html#76 Sounds pretty close to what you are describing no? ... ignoring all the references to capabilities documents which would no longer be relevant in WFS 3.0 anyway. WFS 2.0 supported two types of paging. The first used startIndex and count and was there for backward compatibility with WFS 1.X. The second method used count and prev/next links in the response. I always preferred that latter so I would be good with tossing the legacy and sticking with count/prev/next. The draft says that the 10000 value is only an example and may be changed (see Req 17 in the core). |
Yeah, that's definitely right in line. Count & prev/next links. Would be happy to toss the legacy and do the count/prev/next. Ah, and missed that the 10000 is an example. I think that is the challenge with the current format, how to get across what can be tweaked and what must be there - though I've got no great ideas on how to do it better, as the current spec does a decent job, particularly with the sample openapi document. |
Tossing the startIndex legacy would be fine with me, too. So we would "just" keep count and it would be up to the implementation how it creates the prev/next links (which could use a parameter like startIndex). To make it clearer that the 10 and 10000 are just examples, maybe we should add an explicit sentence in the count requirement that default and maximum count values have to be declared in the API definition, but the values are determined by the server. |
That is correct. The prev/next links would be opaque. An implementation could generate whatever links it wants including using the startIndex parameter (which would now be reclassified as a vendor-specific parameter). |
Agreed. And yeah, an explicit sentence on the server determining values makes good sense. |
I think Should |
Agreed on both counts. Better to return records, even less than asked for, than return an error. And I'd missed 'prev' in there - definitely should make it optional. We don't do it in the Planet data API, as it's definitely more complex to support it, and we didn't see any win from doing it. Maybe we should put that one in its own ticket? |
By "suggestion" do we mean a maximum per fetch? count=100, I want 100 but the server can return any number up to 100 ... and provide a next link for next set of features. |
I think it probably boils down to:
So:
|
Don't think that the server should ever return more than requested ... If client requests 1 gets 1 or zero and possibly a next link. |
I agree with making prev optional and count as a hint with the constraint that the server must not return more entities than the count value. |
To summarize, the conclusion seems to be:
Is everyone ok with this or do we need more discussion? |
Okay with me @cportele |
Okay for the core. But let's not loose sight of Chris's original issues. Some of the paging practices he referenced allow random access to the pages (give me page n). This is a ubiquitous web capability. It is also far more difficult than it looks. Can we leave this issue open, or open a new issue, to track this for an extension to the core? |
Agreement on 2018-02-01: Implement the proposal above. Random page access can be implemented in the future in an extension, if it turns out that it is a common need. |
@cholmes how is a "saved query" actually saved? I mean, an efficient implementation in my mind would require a versioning backend, otherwise we'd have to dump the whole result without knowing if the user will ever crawl it, which does not seem to be feasible on a public server where everybody can start one or more paging operations. |
@aaime to clarify, there's absolutely nothing wrong with an implementation like ElasticSearch implements the Scroll and Search After APIs which offer two alternatives. For a DB, "Scroll" would map onto named cursors (but requires mapping to a consistent open DB connection, which has other issues). A simple DB-backed implementation of "Search After" is
https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/ has some other database-driven options too. And if you have a versioned backed then there are other possibilities as well. |
@rcoup thanks for the feedback. Those a great suggestions, easy to implement when one is focusing on publishing a particular data set. Doing the same in something generic as GeoServer (any data source, any sorting attribute) would seem challenging, but who knows, maybe one day ;-) |
I looked at range requests a year ago for "paging" (using features as units), but decided that it is not a good fit. I need to look, if I have any notes of my analysis.... |
Having the paging information in HTTP headers is problematic, since it can't be embedded to links (e.g. the |
Yes, there were a few reasons why I did not propose Range headers for "paging":
@cmheazel - What would be the practical benefits for using range headers in your view? And how to address the concerns above? |
The question came back up (among other things) because during CITE testing we found out that, in general, it's hard to test for proper implementation of paging. The servers are not required to expose a specific data set, that implies in order to test paging the test suite can only start from the first page and got to the very end to verify all possible links layouts. This may lead to tests taking hours or more.
To support it, one needs to have random paging access. In my experience, this is one of the most common WFS use cases, side by side with visual display of data (which is arguably better taken care of by vector tiles). |
I agree with @cportele's assessment here. It's something seems like it's applicable until you dive deeper, then the abstraction kinda falls apart quickly. |
@aaime but there's zero UX benefit to the paging here? It's not like I know what's on Page 4, or that page 5 is the "right" page for my answer. Next/Previous buttons and maybe an (estimated) total result count would achieve the same thing IME. |
Next and Forward help with random looks at the data, when you have no idea of the data layout, yes. About "it's not like I know what's on Page 4" I have a different experience. I've been working in municipalities where these paging UIs are used a lot, they normally have a default sorting, and after a while people start growing a sense of where is stuff they are looking for in the paging structure. We would use a filter to locate what we want, but not everybody does (and the WFS 3 basic filtering is not particularly useful to start with). Again, I don't see WFS 3 being particularly useful without extensions, and as said, this may just how it's meant to be, a base kit on which one puts extensions to make it useful for a particular use case. |
For simpler use cases WFS 3 Core is and should be useful as it is. For more advanced requirements / deployments extensions will be needed / essential. This includes several of the WFS 2 capabilities (which were often also in additional conformance classes). |
In WFS3 Core there's no way to specify the sorting order. Therefore paging is only really useful for "streaming" through the response in When sorting extension is added paging becomes much more meaningful. Then you can access the last page by flipping the sorting order and accessing the first page, so I'd still vote no for |
Do we need to keep this issue "re-opened"? I do not see any proposal emerging for a change to the specification. |
I'd prefer to have the 'extension' for startIndex be published and blessed by the WFS group before we close this issue. I've liked the idea of a simple core, but I think we need to prioritize publishing the set of core extensions that gets WFS3 rough 'parity' with the core of WFS2 asap, to cover many of its core use cases. I've been trying to push WFS 3 in STAC, and we hit this paging issue, and I tried to explain that the 'wfs 3 way' was to recommend startIndex, and that we should adopt the same. But then I had no place to point except this issue. It looks like there's actually a folder for extensions now, with one it for CRS https://github.com/opengeospatial/WFS_FES/tree/master/extensions which is a great improvement. Though there's a ton of boilerplate going on in that crs directory, is there a way to cut that down? |
Without explicit sorting, does pagination have much meaning? Pagination of an unsorted (or at least, not-explicitly-sorted) result seems like a useful way to break up a request that would otherwise be too large for client or server, but random page access to unsorted records doesn't seem like a use case for anything apart from achieving these small sequential payloads. A client cannot sort without access to the full collection. Given that sorting is not mentioned in the Core (I think; I've only just read the specification), is there a contract that a particular page has coherent content with its linked adjacent Sorting features of a collection by time (if temporal) and then secondarily by primary key seems like a suitable implementation for some servers and datasets, but obviously primary keys are not necessarily in any kind of semantic and/or alpha-numeric order so ideally function only as tie-breakers when sorting on some other property. Is time a default sort condition? And if so, what of features without temporal information: is it determined by their name? What of features with durations? Being able to bypass a server's maximum Another solution might be the ability to return a list of all pages upfront, thereby giving a client the opportunity to request pages in parallel rather than sequentially. This has advantages to the client beyond reducing the probability of data in a collection changing while paginating: including client-side latency, and UI advantages when aiming to render a Hypothetically, pages could include some information about their range (e.g. temporal extent if relevant), which would provide additional benefits to both human and computer interaction. For large collections this is still probably not feasible, since it may not be able to efficiently compute where page boundaries lie in a sorted collection beyond previous/next relations. However, given that a collection's feature count is known at request, as is the page Pages themselves could even take on an explicit spatial property, perhaps if pages are organised (features are sorted) into some kind of tessellated grid, like DGGS—at this point perhaps a client would prefer vector tiles. These are more rambling and obvious thoughts than coherent contributions. My point is really the same as @jampukka's:
To this I'd add:
|
The sorting discussion has been moved to #157. This issue can be closed, all open discussions have separate issues. |
Was discussing this topic with @tomkralidis, i wonder if we can reopen this discussion (here or in common), Ah, sorry i see the item is moved to #75. |
So I believe paging as it stands right now is a bit problematic. Hopefully a developer who has actually worked with paging on restful api's with elastic search backends sounds in and can explain more.
But in Planet's data API we don't have 'startIndex' or anything like that, since it's a lot more complexity to enable clients to request arbitrary locations in an index. Indeed if the index changes, with data added or deleted then that can throw off the results.
Planet solves this by requiring users to create a saved search, or there's a shortcut 'quicksearch' to get results right away. But both create an index on the server side that's set in time, that the user can page through. DigitalGlobe actually doesn't enable paging on their Catalog results, and the underlying vector data store seems to require a similar creation of a paging ID - https://gbdxdocs.digitalglobe.com/v1/docs/vs-retrieve-page-of-vector-items
I don't think we need to specify that level of paging / search in the core of the spec (though possibly an extension / best practice). But I think we should remove 'startIndex', in favor of just having results supply their 'next' link, which can be generated by the service. Planet's 'next' links look like https://api.planet.com/data/v1/searches/f8abab5007a14b31b5ccfb8a3d3f02d1/results?_page=eyJxdWVyeV9wYXJhbXMiOiB7fSwgInNvcnRfcHJldiI6IGZhbHNlLCAicGFnZV9zaXplIjogMjUwLCAic29ydF9ieSI6ICJwdWJ
(with a string that is actually 3 times that long).
In the STAC spec we just have an optional nextPageToken, and then the 'links' section uses that token.
Services wouldn't be required to use an obscure string for paging through results. Though we should make a recommendation on if services are expected to return consistent results, or if it is ok that they are returning like less than the 'count'. See like https://developers.facebook.com/blog/post/478/ for an api that doesn't guarantee the number of results.
Additional discussion on paging best practices at https://stackoverflow.com/questions/13872273/api-pagination-best-practices - we should perhaps recommend some default ordering of results that paging can be driven off of.
The other thing that seems a bit arbitrary is the 10000 maximum limit on the 'count' parameter. And I couldn't figure out in the spec if that is something that implementations can change, or is like hard coded in. At Planet we enable dynamic setting of the page size, but the limit is 250 results per page. And the operations team resists any increases to that, as it introduces more complexity to support.
The text was updated successfully, but these errors were encountered: