Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importer: for scaling, process individual sources concurrently #2862

Open
andrewpollock opened this issue Nov 13, 2024 · 0 comments
Open

importer: for scaling, process individual sources concurrently #2862

andrewpollock opened this issue Nov 13, 2024 · 0 comments
Labels
backlog Important but currently unprioritized enhancement New feature or request

Comments

@andrewpollock
Copy link
Contributor

Problem statement
The importer currently processes each configured source serially, every 15 minutes. When all sources have no new records, this is fine, as each source is fast enough to evaluate and import a handful of new records from. Many of these sources are quite slow to process if a full reimport is required, and this will cause a run to blow out substantially. Sources later in the list are penalized by virtue of being stuck behind the slow source that is being reimported. As the number of sources continues to grow, the approach of serial processing will inherently become slower and slower and more prone to this problem.

Describe the solution you'd like
Instead, process the sources concurrently, to avoid the problems described above.

Describe alternatives you've considered
In varying degrees of complexity with various tradeoffs:

  • rely on Kubernetes to run the importer on each source as its own discrete Pod
  • have the importer spawn each source as its own child process
    • potentially have a worker pool type of architecture to avoid having all the sources run in parallel all at once. This will become particularly important as the number of sources continues to grow.
@andrewpollock andrewpollock added enhancement New feature or request backlog Important but currently unprioritized labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Important but currently unprioritized enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant