Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthen feature collection ingestion logic in weather-mv #385

Open
deepgabani8 opened this issue Aug 26, 2023 · 0 comments
Open

Strengthen feature collection ingestion logic in weather-mv #385

deepgabani8 opened this issue Aug 26, 2023 · 0 comments
Labels
enhancement New feature or request weather-mv

Comments

@deepgabani8
Copy link
Collaborator

The story / Current behavior
In the last step of weather-mv pipeline, the ingestion transform, each worker is ingesting its particular feature-collection. Our logic ensures that there is a room for a feature collection ingestion task in the earth engine task list & for that, each worker is checking if the task list has a room for its particular feature collection, and it is checking continuously, at a certain interval. This is the point, where we can optimize.

Suggested changes
In the ingestion transform, when it's feature collections, use flatMap to list all the feature collections coming from the above transform. Then pass this list as a single item to a ParDo function (meaning: run just a single worker) and this worker is now responsible to ingest all the feature collections. It does the same thing that current logic does. The benefit is, instead of all the workers checking for a room in the EE task list, only one worker is checking. Additionally, we can make the waiting time even more, and when it finds room, it creates tasks at once. For instance, the worker checks at one minute, and if it finds 5 vacancy, it creates 5 tasks at once, now that it has access to the entire feature collection list.

Benefits

  • Effectively, less load on the earthengine, less usage of EE query quota.
  • Less workers running (or waiting) at a time, which is the main benefit, it will save huge memory power, better for the environment :) i.e. NEXRAD pipeline ran for 3 days with constant 2TiB+ memory consumption, which could go down to 8 GiBs.
@deepgabani8 deepgabani8 added enhancement New feature or request weather-mv labels Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request weather-mv
Projects
None yet
Development

No branches or pull requests

1 participant