Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test location lon/lats for common source datasets (that represent areas, not points) #131

Open
riordan opened this issue Sep 30, 2016 · 1 comment

Comments

@riordan
Copy link

riordan commented Sep 30, 2016

Point data is frequently used to represent areas (e.g. the MaxMind GeoIP Farm From Hell). That's bullshit. Points are lies.

Building on the suggestion issue from #123, where there's a dataset of fixed, published points, notify the user it's from one of those common datasets.

These include:

This could be done using Boom Filters or Cuckoo Filters to test for membership. While newer, there appears to be a pretty nice cuckoo filter implementation in node. Using this, we could distribute very small models rather than the complete dataset, making the footprint for this test fairly small (by comparison).

We'd do a 2-pass system for identifying common space->point data:

  1. one filter for ALL of the above points: Is this a common bad lon,lat?
  2. A filter for each or the above categories to notify the user which dataset it's likely from
@newsroomdev
Copy link
Member

newsroomdev commented Oct 14, 2016

I'm gonna break this up into some sub-tasks because there's quite a lot of data to be vacuumed up and sorted properly.

  • MaxMind
  • GeoNames
  • Postal Codes
    • TODO: munge Who's On First data. (The postal code GeoJSONs on Github seems to have 0,0 coordinates, example here. Could you provide some details on how to get this into a list of points to add to our filters @riordan?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants