-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
We have an Elastic DB where data flows in from different sources. This data contains an incremental field called timestamp, which keeps track of when the data was inserted into the database.
Our problem consists of two parts:
Bulk Data Transfer: This involves transferring old data as a one-time exercise, which we will handle on our end.
Real-Time Data Ingestion: We need Pathway to continuously send a query based on the incremental timestamp field to pull data from Elastic (e.g., data that is 5 seconds old) and ingest it into our Click-House database via Pathway.
To ensure no data is missed or duplicated, the query must be sent every 5 seconds to retrieve the most recent 5 seconds of data.
In case a large volume of data gets inserted within a short period (e.g., in the last 5 seconds), we may need to implement pagination to handle the data efficiently.