You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem statement
The importer currently processes each configured source serially, every 15 minutes. When all sources have no new records, this is fine, as each source is fast enough to evaluate and import a handful of new records from. Many of these sources are quite slow to process if a full reimport is required, and this will cause a run to blow out substantially. Sources later in the list are penalized by virtue of being stuck behind the slow source that is being reimported. As the number of sources continues to grow, the approach of serial processing will inherently become slower and slower and more prone to this problem.
Describe the solution you'd like
Instead, process the sources concurrently, to avoid the problems described above.
Describe alternatives you've considered
In varying degrees of complexity with various tradeoffs:
rely on Kubernetes to run the importer on each source as its own discrete Pod
have the importer spawn each source as its own child process
potentially have a worker pool type of architecture to avoid having all the sources run in parallel all at once. This will become particularly important as the number of sources continues to grow.
The text was updated successfully, but these errors were encountered:
Problem statement
The importer currently processes each configured source serially, every 15 minutes. When all sources have no new records, this is fine, as each source is fast enough to evaluate and import a handful of new records from. Many of these sources are quite slow to process if a full reimport is required, and this will cause a run to blow out substantially. Sources later in the list are penalized by virtue of being stuck behind the slow source that is being reimported. As the number of sources continues to grow, the approach of serial processing will inherently become slower and slower and more prone to this problem.
Describe the solution you'd like
Instead, process the sources concurrently, to avoid the problems described above.
Describe alternatives you've considered
In varying degrees of complexity with various tradeoffs:
The text was updated successfully, but these errors were encountered: