-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intersects or not slow #288
Comments
Do you have test data that triggers it? I'll try do to a new release soon and add the microbenchmarks. This will make it easier to reproduce/profile in isolation. |
Not, it's customer data but I can describe it. X polygons is rectangles, and Y is any S2Loop But I think it's concrete case in general I want to find some more effective way. I found CrossingProcessor created every time, if exist way to create it single time and when clear it became better. I think it can be class which inherit from S2BooleanOperation. In such case we at least can avoid allocate/free many times |
What do you think about add something like
Same possible for I think it's pretty common case when polygon contains only single loop? (Any single polygon without holes) |
It seems like it would be better if
Can you make a similar, synthetic test case not derived from the customer data? |
+1 to |
Sounds good if it will be faster. I think I can provide some benchmarks later. In general I tried to speedup arangodb searching geo data with inverted index (of course firstly I fixed not related to S2 issue). And know I have two issues:
So I also have question about 2, will be nice if you have some advice. The main pros of storing geo json it's more compact, especially for points (needs only 8*2 + 1 bytes). Also now I implemented serialization with s2 Decoder/Encoder and S2Region (we use it for precise comparison, after finding terms from S2TermIndexer) And I not sure, I see a lot of lazy classes which named like S2LaxPolygonShape, is it better to use? But how to intersect/etc they? |
I'm working on new bulk queries, probably the intersection query will be what you want, it'll be substantially faster than S2BooleanOperation which effectively computes the intersection then tells you if it's empty or not. I'll be writing that here in Q1, I've got all the pieces written and just need to integrate and test. As for formats, I'd recommend storing data directly in the encoded format. Build an S2Polygon (or S2LaxPolygonShape) and store that binary blob directly. We can decode those very quickly and lazily if need be. I don't see how GeoJson can be smaller since it's text based, unless you're throwing away almost all your precision and storing just two decimal points. If you want to do that, better to "snap" your geometry to a given S2Cell level. Two digits of precision is ~a level 9 cell (source), so you could snap to level 9 and use COMPACT encoding and the encoder can use the fact that the points are snapped to reduce the size (they can be stored as integers, level 9 would have 3+18+1=22 bits/point). S2LaxPolygon[Polyline]Shape are lax not lazy. They allow degeneracies such as degenerate edges (v0,v0), and sibling pairs (two reversed edges touching). They're meant to be more flexible and ultimately replace S2Polygon/S2Polyline, and they still work with things like S2BooleanOperation. |
Thanks for detailed answer.
We store GeoJson in binary representation, something like messagepack/bson/etc if you familiar (so it without some text overhead, like spaces, comma, etc) In average we need 16 byte per point (because GeoJson describe points like lat lon), and few additional bytes per array to store then it starts/ends + constant like 40 bytes. S2Polygon needs: So it's extremely bad for small polygons S2LaxPolygon Encode looks a lot of better So it better then lat, lon encoding GeoJson before 15~20 vertices, then it also worse but it's understandable Of course with CodingHint Compact it can be better for non convex polygons, but only a little In general I think we should use S2LaxPolygon instead of S2Polygon, it looks a lot of better, but translation is not easy |
I meant they have EncodedS2Lax* interface to decode them lazily |
Yes, but they need to compute index. For an example S2Polygon::Contains(Point) not always need to do it. I think my main issue: I want to compute exact yes or no. And one operand from user query, for it I can do heavy precompute, and it's size doesn't really matter. Another operand from index, in general from disk, and I already know it probably intersects (because I found it terms (terms from coverage)) Simplified: auto data1 = something heavy and precompute } |
Yeah the Lax classes are implementations of the new Shape API which is more flexible in general, S2Polygon is really legacy at this point, I'd encourage you to use Shapes whenever possible.
Ah yes they do, we have EncodedS2ShapeIndex as well so you can decode the index and the shapes in it lazily as well to minimize disk seeks. You can decode them all at once as well though.
We have s2shapeutil::ContainsBruteForce that can check point containment without an index. Intersection and containment are the same for a polygon/point but merely checking for vertex containment and lack of edge crossings isn't sufficient for polygons because they have interior that can escape, so you have to verify orientation as well, which makes it a fair bit harder. |
Thanks! |
I think main issue this todo
https://github.com/google/s2geometry/blob/master/src/s2/s2boolean_operation.cc#L2255
Will it be done sometime?
The text was updated successfully, but these errors were encountered: