Geoparquet 1.1 spec compliance update#2271
Conversation
C-Loftus
left a comment
There was a problem hiding this comment.
Few comments on anything non obvious
| ' field of provider config' | ||
| LOGGER.error(msg) | ||
| raise Exception(msg) | ||
| raise ProviderGenericError(msg) |
There was a problem hiding this comment.
Raises a specific error instead of generic exception
| self.maxy = self.y_field | ||
| else: | ||
| self.miny, self.maxy = self.y_field | ||
| self.bb = [self.minx, self.miny, self.maxx, self.maxy] |
There was a problem hiding this comment.
This variable self.bb was unused
| :returns: generator of RecordBatch with the queried values | ||
| """ | ||
| scanner = pyarrow.dataset.Scanner.from_dataset(self.ds, **kwargs) | ||
| scanner = self.ds.scanner( |
There was a problem hiding this comment.
To my understanding pyarrow more efficiently creates the scanner when calling with the method on the object.
Feel free to let me know if you prefer removing use_threads or tweaking other kwargs here
|
|
||
| :returns: dict of 0..n GeoJSON features | ||
| """ | ||
| result = None |
There was a problem hiding this comment.
No need to store this as a variable since we always just return the result directly
|
@C-Loftus can you address the CI errors? Thanks in advance. |
|
Fixed! Sorry I missed that |
|
Thanks for your review and feedback! I will prioritize addressing this as soon as I have some time, most likely early next week |
|
Thanks again for your review. I believe I should have all feedback addressed. The one extra change I made in the latest commit is to remove some of the dangling variables from |
Overview
This PR address #2252
Related Issue / discussion
Closes #2252
Additional information
In the future it would be useful to change all code in this provider to skip the pandas serialization step. It is generally not necessary and adds extra overhead to convert to the intermediate format. However, for the sake of not making too many changes in one PR, I've left it as is.
I have not changed any of the code for s3fs. It is possible this could be tweaked for better performance.
Dependency policy (RFC2)
Updates to public demo
Contributions and licensing
(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)